Merging TiVo Transport Stream Files
TL;DR:
Read this if you’re a developer and have had problems downloading an error-free MPEG-4 Transport Stream file from your TiVo (i.e. .ts
file). This describes how someone can write a tool to fix the transport stream errors you encounter.
Background
My cable provider recently switched all its channels to MPEG-4. This means that all of my TiVo's recordings must now be downloaded in Transport Stream format (as opposed to MPEG-2 and Program Stream).
Downloading a Program Stream from my TiVo via kmttg
works (worked) great! That had been my modus operandi for years now. I would download the .TiVo
file and have kmttg
decrypt it to a .mpg
file using TivoLibre
. If I try to download an MPEG-4 video from the TiVo in the same way, however, it won't have any Video stream in it. I have to download it as a Transport Stream, and the resultant .ts
file often has errors in it.
As best as I could tell, the only solution to this transport stream problem has been to re-download the recording over and over again to see if it gets better. It might help to put the TiVo in Standby mode first.
I wanted something better.
For the text below, note the following terminology:
- "program". Refers to a logical program on the TiVo that I want to download. For example, a one-hour TV show.
- "recording". Refers to the program (or some portion thereof) that was recorded on the TiVo. It will be the file recorded from one of TiVo's tuners and at a certain start time. If I record the same thing on two different tuners, they will be different recordings.
- "download". Refers to a recording I've downloaded from the TiVo. I can have multiple "downloads" for the same "recording".
My results
I can download a TiVo MPEG-4 recording multiple times as a Transport Stream and then merge those downloaded files to get rid of packet errors. I can also merge recordings that overlap to create a single program. For example, I can merge a recording from 8:00p to 8:35p on channel 105 with a recording from 8:30p to 9:00p on the same channel and output a .ts
that is one hour long.
I can resort to skipping over bad packets that are stubbornly at the same timestamp across multiple downloads. Doing that will lose about 0.3s to 2s of the recording in the output.
Heads up
While I have exhaustively implemented the solution for this (and it works!), it is in no way presentable to the public. It's thousands of lines of code that I'm not prepared to open source. I wish I could say I'm in the process of publishing it, but I'm not. The time involved would be in productizing it, getting a copyright release from my employer, etc. I've hit my limit of how much time my family is willing to let me spend working on this. :)
My hope is that somebody else will take up this mantle, based on the technical details I'm presenting here, and write a solution that I can maybe download and use instead of mine.
I should also note that my solution is incredibly slow compared to TivoDecoder.jar
and tivoconvert
. You probably wouldn't want it anyway. It can take over 30 minutes to merge two one-hour .TiVo
files into a .ts
output file. And it's written in Perl, with the Turing decryption written in inlined C.
TiVo transport stream observations
Turing state is reset with every TiVo packet. This means I can jump into the stream and start decrypting from any TiVo packet. There's a TiVo packet every 2-3 seconds.
The encryption is different for every download of a TiVo file (even with the same recording). Presumably, the TiVo does its encryption on the fly.
All Audio packets are encrypted. There's nothing more to say about this.
Only the first of every four Video packets is encrypted. Sometimes, this packet encryption is missed and the following Video packet will be encrypted instead -- in which case the next three Video packets won't be encrypted. In other words, it doesn't "make up" for missed encryption; it starts the pattern over again from there.
I sometimes see a TiVo file where every Video packet is marked as encrypted. They are NOT all encrypted, however. I don't know if this is the TiVo's fault, or if this is due to my cable provider (Comcast, in my case). But it's clearly a bug, and it causes TivoDecoder to produce a corrupt output file when this happens.
Multiple recordings (and downloads) of the same program have identical Audio and Video packets after decryption. Note that this is across multiple recordings, not just downloads. Modern TiVos simply record the digital data from the tuners; they don't transcode it. This is why I can merge intersecting recordings.
TiVo packets appear at the same timestamp across multiple recordings and downloads of the same program. They will not be byte-for-byte identical, however, because they contain different decryption data.
PAT and PMT packets will be at different timestamps in the transport stream across different recordings, although they will otherwise be identical. Note that they will be at the same timestamp across different downloads.
Occasionally a recording will be missing a Video/Audio packet. I presume that happens when a packet gets dropped from the provider or TiVo's tuner.
The Continuity Counter in the transport stream packets is reliable (as specified in the Transport Streams spec). I can detect bad or missing packets by looking at that counter for each PID. Occasionally, I see a duplicate count, though, in which case the following count for that PID will skip ahead by one.
When there are bad packets in the transport stream, I can download the recording again (and again), and oftentimes there are bad packets at the exact same position -- but not always!
I haven't recognized a pattern yet within the bytes of those bad packets. Every 188-byte packet must start with the byte 0x47
. When a TiVo file doesn't have that byte in that position, it's bad data. As mentioned above, I've noticed that multiple downloads of the same recording often have bad data at the same timestamp in the file -- but it's usually with different bad data. This makes me wonder if it's either encrypted data (which will naturally differ across downloads) or if it's just garbage that TiVo threw in there for some reason.
I've noticed there's a pattern of similar timestamps for bad packets. They tend to be about 10 minutes apart, such as timestamps +10m, +30m, +40m, and +50m. It's not always the case, but I see it enough that I don't think it's a coincidence.
TiVo will bail on transferring a recording from pyTivo
once it sees a bad packet. This is part of my motivation for making sure only valid packets end up in the output file.
My solution
Multiple downloads of the same recording
I loop through the downloads and read the next 188-byte packet from each file. I process the packet accordingly:
- If the packet holds the PAT or PMT, I parse that information.
- If the packet is a TiVo packet, I reset the Turing state from it.
- If the packet is encrypted, I decrypt the payload bytes with the current Turing state.
- If it's a bad packet (which usually means it doesn't start with byte
0x47
), I flag that file as out-of-sync and skip ahead in that file until I find the next valid packet. I then skip ahead further until I find the next TiVo packet in the file -- in order to reset the Turing state. I find this often results in skipping between 300ms and 1.6s of the recording. From here on, I compare packets from this out-of-sync file to valid files and clear the out-of-sync flag once five consecutive packets are the same as a valid file's packets.
For the files that are valid (i.e. not out-of-sync), I then compare their packets byte-for-byte. If they are all the same I write that packet to the output file and go to the next packet. If they are not the same I resort to a few tricks to try to resolve the difference in packets. I use the following algorithm to choose which file's packet to output:
If the packets were decrypted, I compare the packet bytes from before the decryption. This is to account for that occasional encoding bug I see where all Video packets are marked as encrypted (even though they're not). If the bytes are the same, I rollback both the Turing state and the packet bytes to what they were before, clear the "encrypted" bits from the packet, and then write those packet bytes to the output file. Note: I don't do this hack before decryption because a packet might look the same across downloads both before AND after decryption. Think of a one-byte payload, and you can imagine the 1-in-256 chance that a duplicate byte there would also decrypt to the same byte across packets.
I check the Continuity Counter of the packets. If the counter was as expected in just one of the files' packet, I write that packet to the output file. I then assume the other file(s) hit some kind of snag, flag them as out-of-sync, and then read ahead in each to find the next TiVo packet (to make sure the Turing state is correct).
I crash with an error. I initially wrote code to choose some packet arbitrarily from one of the files, but I didn't like that this can be inconsistent and leave unpredictable packets in the output file. Note that this is only when multiple, valid packets are different and that can't be reconciled. Knock on wood, I no longer see this issue because I've handled the edge cases.
Okay, so that's how I can get a consistent output file with correct packets for a single recording, merged from multiple downloaded TiVo files.
But what if every download has bad data at the same place in the file? This presents itself when all the files are flagged as out-of-sync. In this case, I simply write the next valid packet from the file with the most contributions so far, clear that file's out-of-sync flag, and report how much was lost. I never see more than a few seconds of video lost.
Multiple recordings of the same program
To merge multiple recordings together that intersect, I have to account for the PAT, PMT, and TiVo packets that differ between them. I mark any download as out-of-sync from the get-go if its start time is later in the program.
I also added two more steps to the algorithm that decides what to do when packets across valid files are not byte-for-byte the same:
If one of the packets is a PAT or PMT packet, I either write that packet to the output file or throw it away. I decide based on which file has contributed the most packets to the output file so far. I then make sure to reuse the other files' presumed non-PAT/PMT packets for the next round.
If all of the packets are TiVo packets, I just choose one of them to write to the output file. I decide based on which file has contributed the most packets so far, but I don't imagine it matters which one. I could presumably even strip out all TiVo packets because the non-encrypted output file shouldn't need them, but I figured I'd keep them for the heck of it.
Additional features
The above is my general algorithm. I ended up making a bunch of improvements for performance and convenience.
One such improvement was the ability to use .ts
files as inputs to the merge. In particular, this allows me to easily append to an output file that was interrupted during an earlier run.
I also added a --start
command-line option that skips to the given time in the recording and then rewinds to the most recent TiVo packet before proceeding through the file.
When my merging tool takes such a long time to execute, I implement things like this. :)
The encryption bug
Many of the major performance issues I had were related to the hack I had to do because sometimes all Video packets are erroneously marked as encrypted. For example, the rolling back of Turing state that I mention above sounds easy, but it initially took an incredible amount of execution time because I was copying the Turing state before each packet "just in case". At least in Perl, this cost a lot of time.
One of the things I do now is to only save the Turing state before decrypting a packet if it (in its alleged "encrypted" state) compares byte-for-byte the same as another file's packet. If it isn't the same as another file's, then it can't be one of these errant "encrypted" packets.
Another thing I do is to not bother decrypting an alleged "encrypted" Video packet if it's not the first packet of four. This also helps protect me from the rare case of a packet that you can't easily tell if it's errant. For example, a packet whose payload is only one byte long and looks the same as another packet both before and after decryption.
Lastly, if there's an error decrypting a packet because it's missing the transport stream payload, I assume it's an errant packet that should not have been decrypted in the first place and I clear the encryption bits.
Sample output
In case it helps to see context, here's some output from my tool when merging two different downloads of a single recording that had the encryption bug.
Time 2m49s : Writing merged packet; packets/sec=6421, avg packets/sec=6515; ETA is 1m57s
Time 3m : Writing merged packet; packets/sec=6610, avg packets/sec=6521; ETA is 1m53s
Time 3m2s : Input '01/s07e16 - Forever - 2020-09-18_1005.TiVo': ERROR: Error decrypting packet, but assuming it should not have been due to 'hack_no_encryption': No 'payload_offset' set for packet; is there a valid payload?
Time 3m2s : Input '01/s07e16 - Forever - 2020-09-18_1005.TiVo': PID 0x0f0c: WARN: Count indicated this should have been encrypted (count=398788); decrementing 'video_packet_count' because now the following Video packet is expected to be encrypted
Time 3m2s : Input '02/s07e16 - Forever - 2020-09-18_1005.TiVo': ERROR: Error decrypting packet, but assuming it should not have been due to 'hack_no_encryption': No 'payload_offset' set for packet; is there a valid payload?
Time 3m2s : Input '02/s07e16 - Forever - 2020-09-18_1005.TiVo': PID 0x0f0c: WARN: Count indicated this should have been encrypted (count=398788); decrementing 'video_packet_count' because now the following Video packet is expected to be encrypted
Time 3m10s : Writing merged packet; packets/sec=6484, avg packets/sec=6519; ETA is 1m49s
Time 3m21s : Writing merged packet; packets/sec=6620, avg packets/sec=6524; ETA is 1m45s
Time 3m32s : Writing merged packet; packets/sec=6506, avg packets/sec=6523; ETA is 1m41s
Time 3m32s : Input '01/s07e16 - Forever - 2020-09-18_1005.TiVo': PID 0x0f0c: WARN: Preemptively marking packet as not encrypted because only 1st of 4 Video packets is expected to be encrypted (count=465435)
Time 3m42s : Writing merged packet; packets/sec=6742, avg packets/sec=6534; ETA is 1m37s
Time 3m54s : Writing merged packet; packets/sec=6652, avg packets/sec=6540; ETA is 1m33s
Time 3m54s : Input '02/s07e16 - Forever - 2020-09-18_1005.TiVo': choose_uniq_packet: ERROR: Allowing duplicate packet, after encryption, that was also a duplicate before encryption
Time 3m54s : Input '01/s07e16 - Forever - 2020-09-18_1005.TiVo': choose_hack_encryption_packet: WARN: Found best packet with hack to determine it shouldn't have been decrypted
Time 4m4s : Writing merged packet; packets/sec=6672, avg packets/sec=6546; ETA is 1m29s
Time 4m15s : Writing merged packet; packets/sec=6685, avg packets/sec=6552; ETA is 1m24s
Alternative Solution
If I have time I might try a faster approach, whereby I'd read from one input file at a time until an error packet is found. For an error, I'd read the other input files to try to find a valid packet to fix it. This would speed up merge time considerably because I'd only be reading from a single file most of the time.
With the checks I mentioned above (such as the continuity counter), I feel like this solution could work reliably. That is, there would be no need to read every packet from every input file to check if they are duplicates. The reason being that the checks may do a sufficient job of detecting when there's a potential packet error.
One thing I don't like about this solution is that I would need to sync up those other input files whenever I need them to fix an error. In my current solution, they are always synced together. Also, intersecting recordings would take some effort to get working with this solution. But, if someone were to write their own utility from scratch, they might try this alternative approach for performance reasons.
Thanks
In closing, I want to thank the authors of tivoconvert
and TivoDecoder.jar
. I couldn't have done this without being able to review those. From figuring out how the metadata is stored in TiVo files to how to decrypt the packets, I would have given up long ago if it weren't for those reference implementations.
If you want something that works well enough today, I'd recommend using one of those. tivoconvert
is awesome, although I don't believe it properly decodes Transport Streams from the TiVo (it was rock-solid for Program Streams). TivoDecoder.jar
works well and it's integrated into kmttg
as TivoLibre
, although it currently drops more than it needs to upon bad packets.
I also want to thank the authors of kmttg
and pyTivo
. I can only maximize my TV viewing through TiVo because of them.