obskyr
obskyr

Reputation: 1498

How does FFmpeg determine the “attached pic” and “timed thumbnails” dispositions of an MP4 track?

The Issue

FFmpeg has a concept of “dispositions” – a property that describes the purpose of a stream in a media file. For example, here are the streams in a file I have lying around, with the dispositions emphasized:

  Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo,
fltp, 251 kb/s (default)
      Metadata:
        creation_time   : 2021-11-10T20:14:06.000000Z
        handler_name    : Core Media Audio
        vendor_id       : [0][0][0][0]

  Stream #0:1[0x2](und): Video: mjpeg (Baseline) (jpeg / 0x6765706A),
yuvj420p(pc, bt470bg/unknown/unknown), 1024x1024, 0 kb/s, 0.0006 fps, 3.08 tbr,
600 tbn (default) (attached pic) (timed thumbnails)
      Metadata:
        creation_time   : 2021-11-10T20:14:06.000000Z
        handler_name    : Core Media Video
        vendor_id       : [0][0][0][0]

  Stream #0:2[0x3](und): Data: bin_data (text / 0x74786574)
      Metadata:
        creation_time   : 2021-11-10T20:14:06.000000Z
        handler_name    : Core Media Text

  Stream #0:3[0x0]: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/
unknown), 1024x1024 [SAR 144:144 DAR 1:1], 90k tbr, 90k tbn (attached pic)

However, if I make any modification to this file’s chapter markers using the C++ library MP4v2 (even just re-saving the existing ones: auto f = MP4Modify("test.m4a"); MP4Chapter_t* chapterList; uint32_t chapterCount; MP4GetChapters(f, &chapterList, &chapterCount); MP4SetChapters(f, chapterList, chapterCount); MP4Close(f);), some of these dispositions are removed:

  Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo,
fltp, 251 kb/s (default)
      Metadata:
        creation_time   : 2021-11-10T20:14:06.000000Z
        handler_name    : Core Media Audio
        vendor_id       : [0][0][0][0]

  Stream #0:1[0x2](und): Video: mjpeg (Baseline) (jpeg / 0x6765706A),
yuvj420p(pc, bt470bg/unknown/unknown), 1024x1024, 0 kb/s, 0.0006 fps, 3.08 tbr,
600 tbn (default) ← “attached pic” and “timed thumbnails” removed!
      Metadata:
        creation_time   : 2021-11-10T20:14:06.000000Z
        handler_name    : Core Media Video
        vendor_id       : [0][0][0][0]

  Stream #0:2[0x0]: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/
unknown), 1024x1024 [SAR 144:144 DAR 1:1], 90k tbr, 90k tbn (attached pic)

  Stream #0:3[0x4](und): Data: bin_data (text / 0x74786574)
  This stream was moved to the end, but that’s intended behavior. It contains chapter titles, and we just edited the chapters.
      Metadata:
        creation_time   : 2025-03-05T09:56:31.000000Z

It also renders the file unplayable in MPC-HC (but not in VLC!), which is apparently a bug in MP4v2. I’m currently investigating that bug to report and potentially fix it, but that’s a separate issue – in my journey there, I’m wracking my brain trying to understand what it is that MP4v2 changes to make FFmpeg stop reporting the “attached pic” and “timed thumbnails” dispositions. I’ve explored the before-and-afters in MP4 Box, and I can’t for the life of me find which atom it is that differs in a relevant way.

(I’d love to share the files, but unfortunately the contents are under copyright – if anyone knows of a way to remove the audio from an MP4 file without changing anything else, let me know and I’ll upload dummied-out versions. Without them, I can’t really ask about the issue directly. I can at least show you the files’ respective atom trees, but I’m not sure how relevant that is.)

The Question

I thought I’d read FFmpeg’s source code to find out how it determines dispositions for MP4 streams, but of course, FFmpeg is very complex. Could someone who’s more familiar with C and/or FFmpeg’s codebase help me sleuth out how FFmpeg determines dispositions for MP4 files (in particular, “attached pic” and “timed thumbnails”)?

Some Thoughts…

Upvotes: 2

Views: 43

Answers (1)

obskyr
obskyr

Reputation: 1498

Though I figured it out by reverse-engineering my MP4 files rather than reading the FFmpeg source code, here’s the answer:

It’s possible for a chap atom to refer not only to chapter text tracks, but also to JPEG video tracks. If a video track is referenced by a chap atom, FFmpeg sets the “attached pic” and “timed thumbnails” dispositions. Other dispositions are set in other ways.

(At the time of writing, MP4v2 doesn’t handle references to video tracks in chap atoms, and instead removes those references but leaves the track intact, resulting in the situation in the question.)

Upvotes: 2

Related Questions