Using Media Foundation source reader to read 3D (left-right) video

When trying to read a 4K video marked as 3D left-right, IMFSourceReader returns images of size 1920x2160 (half the image). I'd like to get the full image if possible, or at least have access to the second half.

I'm aware of MF_ENABLE_3DVIDEO_OUTPUT, but I'm not sure how to apply it to an IMFSourceReader. I tried to set it on the media type, but that didn't change anything.

The source reader tells me that the stream's width is half of the frame's size (1920x2160 for a 4K movie), but when I use GetBufferCount on the sample, the result is 1. So I have no idea how to get all the data of the frame.

I looked at the DX11VideoRenderer sample, and that seems to assume that GetBufferCount returns 2. It however doesn't use IMFSourceReader so I'm not sure how to apply what it does to that scenario.

Optimally, what I want is to use MF3DVideoOutputType_BaseView and get the full 4K source image.

Edit:

This has to do with Facebook 180 data. (Setting "Half Equirectangular" and "Side-bySide". It results in YouTube V1 spherical metadata in the MP4, with a particular setting that Facebook recognises for 180 degree stereo videos.)

An example video is available here: https://drive.google.com/open?id=154dl33y9RKZcvTqdBZkLQ5Y5ckG2mZtf (it will be removed at some point in the future; if anyone has a better recommendation of where to upload it, feel free to suggest it).

Upvotes: 1

Answers (1)

Roman Ryltsov

Reputation: 69724

This might not be exactly an answer because I can do what you are trying to by a straightforward read. Steps below however might give you a hint where to troubleshoot.

IMFSourceReader returns images of size 1920x2160 (half the image). I'd like to get the full image if possible, or at least have access to the second half.

I processed your sample video with an application that uses MF Source Reader to read video, decompress and save as individual frames. I see both halves of the video accessible.

Here are the details.

Video media type indicates 3D video:

MF_MT_MAJOR_TYPE, vValue {73646976-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFMediaType_Video, FourCC vids)
MF_MT_SUBTYPE, vValue {34363248-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFVideoFormat_H264, FourCC H264)
MF_MT_AM_FORMAT_TYPE, vValue {E06D80E3-DB46-11CF-B4D1-00805F6CBBEA} (Type VT_CLSID, WMFORMAT_MPEG2Video)
MF_MT_VIDEO_PROFILE, vValue 100 (Type VT_UI4)
MF_MT_VIDEO_LEVEL, vValue 51 (Type VT_UI4)
MF_MT_FRAME_SIZE, vValue 16492674418800 (Type VT_UI8, 3840x2160)
MF_MT_PIXEL_ASPECT_RATIO, vValue 4294967297 (Type VT_UI8, 1:1)
MF_MT_INTERLACE_MODE, vValue 7 (Type VT_UI4)
MF_MT_FRAME_RATE, vValue 128849018881001 (Type VT_UI8, 30000/1001, 29.970)
MF_MT_SAMPLE_SIZE, vValue 1 (Type VT_UI4)
MF_MT_AVG_BITRATE, vValue 82101870 (Type VT_UI4)
MF_MT_MPEG4_CURRENT_SAMPLE_ENTRY, vValue 0 (Type VT_UI4)
MF_MT_MPEG4_SAMPLE_DESCRIPTION, vValue 00 00 59 2A 73 74 73 64 00 00 00 00 00 00 00 01 00 00 59 1A 61 76 63 31 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0F 00 08 70 00 48 00 00 00 48 00 00 00 00 00 00 00 01 15 41 6D 62 61 72 65 6C 6C 61 20 41 56 43 20 65 6E 63 6F 64 65 72 00 00 00 00 00 00 00 00 00 00 00 18 FF FF 00 00 00 4B 61 76 63 43 01 64 00 33 FF E1 00 34 27 64 00 33 AC 34 C8 03 C0 04... (Type VT_VECTOR | VT_UI1)
MF_MT_MPEG_SEQUENCE_HEADER, vValue 00 00 01 27 64 00 33 AC 34 C8 03 C0 04 3E 84 00 00 0F A4 00 03 A9 83 A1 80 00 4C 4B 40 00 03 93 87 0B BC B8 D0 C0 00 26 25 A0 00 01 C9 C3 85 DE 5C 3E 11 08 D4 00 00 00 00 01 28 EE 38 B0 (Type VT_VECTOR | VT_UI1)
MF_MT_VIDEO_3D, vValue 1 (Type VT_UI4)
MF_MT_VIDEO_3D_FORMAT, vValue 2 (Type VT_UI4)
MF_MT_VIDEO_ROTATION, vValue 0 (Type VT_UI4)
MF_NALU_LENGTH_SET, vValue 1 (Type VT_UI4)
MF_PROGRESSIVE_CODING_CONTENT, vValue 1 (Type VT_UI4)
{11D25A49-BB62-467F-9DB1-C17165716C49}, vValue 00 00 00 00 00 00 00 00 00 00 00 00 (Type VT_VECTOR | VT_UI1)
{4A8FC407-6EA1-46C8-B567-6971D4A139C3}, vValue 0 (Type VT_UI4)
{A51DA449-3FDC-478C-BCB5-30BE76595F55}, vValue 1 (Type VT_UI4)

Note 3840x2160 resolution and MF_MT_VIDEO_3D_FORMAT value of MFVideo3DSampleFormat_Packed_LeftRight:

Each media sample contains one buffer, with both views packed side-by-side into a single frame.

This seems to be correct read of your file. I have my application setting up Source Reader with SetCurrentMediaType call with the following media type:

MF_MT_MAJOR_TYPE, vValue {73646976-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFMediaType_Video, FourCC vids)
MF_MT_SUBTYPE, vValue {00000016-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFVideoFormat_RGB32, FourCC 0x00000016)
MF_MT_FRAME_SIZE, vValue 16492674418800 (Type VT_UI8, 3840x2160)
MF_MT_PIXEL_ASPECT_RATIO, vValue 4294967297 (Type VT_UI8, 1:1)
MF_MT_INTERLACE_MODE, vValue 2 (Type VT_UI4)
MF_MT_FRAME_RATE, vValue 128849018881001 (Type VT_UI8, 30000/1001, 29.970)

That is, it requests decompression of video into full resolution RGB format.

Source Reader is okay with such request and supplies a video decoder to satisfy the format conversion:

Category MFT_CATEGORY_VIDEO_DECODER, Direct3D 11 Aware, Input MFVideoFormat_H264, 3840 x 2160, Output MFVideoFormat_NV12, 3840 x 2160

Apparently the decoder is H.264 Video Decoder in case you would want to manage it directly outside of internal Source Reader pipeline.

The first read video sample has the following attributes:

MF_NALU_LENGTH_INFORMATION, vValue  (Type VT_VECTOR | VT_UI1)
MFSampleExtension_ForwardedDecodeUnits, vValue ??? (Type VT_UNKNOWN)
MFSampleExtension_AccumulatedNonRefPicPercent, vValue 0 (Type VT_UI4)
MFSampleExtension_Token, vValue ??? (Type VT_UNKNOWN, 0x00000282397B1020)
MFSampleExtension_CleanPoint, vValue 1 (Type VT_UI4)
MFSampleExtension_Discontinuity, vValue 1 (Type VT_UI4)
MFSampleExtension_FrameCorruption, vValue 0 (Type VT_UI4)
nSampleTime 0, nSampleDuration 33 3666, nBufferCount 1, nTotalLength 33177600
nBufferIndex 0, nCurrentLength 33177600, nMaxLength 33177600

As you can see it has one buffer and data size is 3840 * 2160 * 4 bytes. The image itself is the one I attached above, with both halves.

This is behavior on Windows 10 October 2018 Update (version 1809). I suppose the behavior is basically matching your original request. I also see that MP4 atoms indicate full resolution as well (3840x2160), so overall the behavior I mentioned above and that I am actually seeing is quite expected. More to that, even SDK topoedit plays the file with both halves meaning that to achieve Movies and TV behavior when stereo is blended from the halves it is necessary to specifically configure the decoder in certain way.

As I understand you are seeing a different behavior and there should be a reason for this, most likely related to video decoder or alternately post-decoding step that either strips half of the video or applies composition of joint view where you did not request it. Since video is encoded in left+right fashion I would say it's unlikely that something in the pipeline is hardcoded to drop the second half and there is no control over this, perhaps it might be a problem with pipeline configuration.

Still it might also happen that earlier versions of Windows lacked support for spherical video and truncated the video seeing that it consists of halves but in the same time having no yet code path and implementation for spherical capabilities.

Windows 10, version 1803 provides support for 360 camera preview, capture, and record with existing MediaCapture APIs. […]

UPDATE 27-Dec-2018: The problem seems to be limited to or at least related to enablement of MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING. When enabled, Source Reader applies Video Processor MFT for format conversion needs instead of internal converter (which is however AFAIR not hardware accelerated). Internal non-hardware converter outputs both views transparently, without presumably even knowledge that a frame has two views in it. Video Processor MFT, however, declared stereo 3D capabilities and in its default mode of operation it drops the second half.

Giving it a quick look I was unable to enable its 3D output options and switch it to keep the rightmost half, whether as a part of single buffer, or a secondary buffer, or as a secondary texture surface. However since it looks like the second half is stripped on a post-decoder step, it should work out well if, for example - and there might be a number of similar way to do this trick - NV12 texture is read from Source Reader, then 3D information is removed from the sample/texture, further conversion, including GPU-enabled, of pixel format results in conversion of full 3840x2160 frame without removal of second half.

Upvotes: 2

Using Media Foundation source reader to read 3D (left-right) video

Answers (1)

Related Questions