Michael IV
Michael IV

Reputation: 11496

LibAV: converting PTS to frame number

I am developing an mp4 demuxer with LibAV. I need a helper function which converts PTS to frame numbers. I came up with the following code which works. But I am not sure how correct it is in the LibAV realm for any video input. In my case video fps = 25, timescale = 25000. Each next PTS is incremented by 40,giving me correct frameNumber.

static  double get_frame_rate(AVStream* stream)
{
    if (stream->r_frame_rate.num > 0 && stream->r_frame_rate.den > 0)
    {
    return av_q2d(stream->r_frame_rate);
    }
    else if (stream->avg_frame_rate.num > 0 && stream->avg_frame_rate.den > 0)
    {
       return av_q2d(stream->avg_frame_rate);
    }
    else
    {
       printf("Unable to determine frame rate\n");
       return 0;
    }
 }


 int64_t timestamp_to_frame(Demuxer* demuxer, int64_t pts)
 {
     AVStream* stream = demuxer->fmtc->streams[demuxer->iVideoStream];
     if (!stream || pts < 0)
     {
         fprintf(stderr, "Invalid stream or PTS.\n");
         return -1;
     }

     // Get the frame rate (FPS)
     const double fps = get_frame_rate(stream);//here returns 25.0

     // Stream time base (e.g., 1/25000 for my case)
     const AVRational timeBase = stream->time_base;

     // Calculate frame duration in seconds
     const double frameDurationInSeconds = 1.0 / fps;

     const int64_t ptsIncrement = (int64_t)((frameDurationInSeconds * timeBase.den) * frameDurationInSeconds + 0.5);
     // Calculate the frame number by dividing PTS by the increment
     const  int64_t frameNumber = pts / ptsIncrement;

return frameNumber;

}

I would like to understand if this code is the proper way of doing it. I was trying to use different libav time rescale functions but couldn't come up with correct results.

Upvotes: 2

Views: 50

Answers (1)

Craig Estey
Craig Estey

Reputation: 33631

Your code may be fine. But, you're using a bunch of floating point calculations to get the 64 bit [integer] PTS increment.

Generally, the PTS increment should a fixed value. If not, it could make A/V sync harder.

So, I think your code should calculate the PTS increment but retain a copy. If the increment changes (radically), this may be a transition or missing frame that you may need to account for. See below.


I'm not sure if there's a "standard" way to get the PTS increment/delta. But, an alternative method I can think of (which you could use as a cross check against your method) might be:

  1. Create an array of PTS values as they arrive.
  2. Wait until you have at least N frames (where N is [at least] the number of frames in the GOP).
  3. Sort the array in increasing order. This is necessary [only] if you're recording the values in decoding order (e.g. For a GOP of IBBBP, you are receiving them as IPBBB).
  4. The PTS increment is the difference array[i + 1] - array[i]
  5. This difference should be the same for any two adjacent array elements.

The array could be a sliding window of the last N frames.

If you are receiving the frames in the correct presentation order (e.g. IBBBP), the sorting (and, hence the array) would be unnecessary. And, you would only need the difference between any two frames.


As I mentioned, the PTS increment should remain constant/fixed. Although they shouldn't, some encoders may have a small jitter around the PTS increment.

The PTS increment should remain constant unless the stream source changes. In broadcast or live streaming:

  1. This could occur during transitions from program material to commercials (or vice versa).
  2. The stream could switch input from one video source to another (e.g. different cameras, different stations/channels, different video meeting participants).
  3. For live streams over [unreliable] UDP, missing and/or corrupted packets.

For the first two, if such a transition occurs, the stream should [probably] generate an IDR, but I'm not sure if that's a requirement, particularly if the video streaming multiplexer is "reclocking" the PTS/DTS/PCR of the source.

If you're just decoding a locally stored .mp4 file, that's all there is to it.

But, if you're decoding a live stream that uses UDP (vs. TCP), you may have to detect and account for dropped video and/or audio packets. So, monitoring the PTS increment (and, hence the frame numbers) for gaps may be necessary.

Upvotes: 1

Related Questions