Reputation: 31
I am developing a special video player which applies "filters" on each decoded frame. My current goal is to scale a decoded frame (no constraint except memory limit of course).
The decoding part is done using ffmpeg (av_read_frame, avcodec_send_packet, avcodec_receive_frame). The EVR from Media Foundation is used as the video render. More precisely, I retrieve a "sample" (which is just a wrapper around a d3d plain offscreen surface), ffmpeg uses this buffer to store the decoded frame and I give this "sample" to the renderer that caches it and then presents it on the screen when needed (presentation time from the sample timestamp, playback rate and the system clock).
I retrieve surfaces (format=X8R8G8B8, type=D3DRTYPE_SURFACE, usage=0, pool=D3DPOOL_DEFAULT, multisample=DDMULTISAMPLE_NONE) from a pool of available surfaces (via IMFVideoSampleAllocator). Working with RGB32 data is a requirement and decoded frames are converted if needed.
Concerning the scaling feature/zoom filter, I first used libswscale (sws_scale function) with SWS_FAST_BILINEAR but it takes ~80ms to resize my frame from 1920x800 to 1920x400 (fixed values for test purpose). I then tried doing the scaling myself using a naive bilinear algorithm but it was worse and was taking ages to complete.
I've done a minimal test case which loads a BMP file, scales it and writes scaled data to another BMP. Surprisingly, the same code takes ~15 ms (libswcale) or ~30 ms (naive bilinear).
I then modified my video player to use av_image_alloc and av_image_copy_to_buffer. Allocating takes no time, copying takes a whole second and scaling takes like 5 ms. The whole part is too slow to do scaling in real time but it shows that there is a big difference between memory "origin" (malloc'ed or d3d surface).
Data alignment may be the cause of the slowness but my test case uses the same pattern in memory (stride=width*4, bottom-up) and it's much faster. I printed input and output buffer %16 and it's 0 so it seems safe for me.
I also tried using StretchRect method but it doesn't work between offscreen surfaces.
Any idea ? NB: I plan creating surfaces and presenting them myself so the renderer part is a weak dependency for me. So if you have a plain D3D sample as a reference, I'll take it.
Upvotes: 1
Views: 675
Reputation: 31
I studied "EVRPresenter sample" code and scaling is done via source/dest RECTs when calling IDirect3DSwapChain9::Present method so I guessed IDirect3DDevice9::StretchRect was the right approach.
Since it doesn't support stretching between offscreen plain surfaces, I created a render target surface with IDirect3DDevice9::CreateRenderTarget and now, StretchRect call works. Scaling is even fast enough to display 4K videos without any jitter! I use libswscale as a fall back.
@VuVirt I'm currently using the EVR which is DXVA-aware so I guess it's used internally. I will read the API carefully, I'll probably use it when I write my own presenter.
I realized it's really the renderer job to mix/scale/present frames. My current code works but I depend on renderer internals. The renderer may use Direct3D10 interfaces at some point and StrectRect is not available in Direct3D10.
Anyway, thanks for reading!
Upvotes: 1