Reputation: 36549
I need to display the local webcam stream on the screen, horizontally flipped, so that the screen appears as a mirror. I have a DirectShow graph which does all of this, except for mirroring the image. I have tried several approaches to mirror the image, but none have worked.
VideoControlFlag_FlipHorizontal
I tried setting the VideoControlFlag_FlipHorizontal
flag
on the output pin of the webcam filter,
like so:
IAMVideoControl* pAMVidControl;
IPin* pWebcamOutputPin;
// ...
// Omitting error-handing for brevity
pAMVidControl->SetMode(pWebcamOutputPin, VideoControlFlag_FlipHorizontal);
However, this has no effect. Indeed, the webcam filter claims to not have this capability, or any other capabilities:
long supportedModes;
hr = pAMVidControl->GetCaps(pWebcamOutputPin, &supportedModes);
// Prints 0, i.e. no capabilities
printf("Supported modes: %ld\n", supportedModes);
SetVideoPosition
I tried flipping the image by flipping the rectangles passed to SetVideoPosition
.
(I am using an Enhanced Video Renderer filter, in windowless mode.)
There are two rectangles:
a source rectangle and a destination rectangle.
I tried both.
Here's approach B(i),
flipping the source rectangle:
MFVideoNormalizedRect srcRect;
srcRect.left = 1.0; // note flipped
srcRect.right = 0.0; // note flipped
srcRect.top = 0.0;
srcRect.bottom = 0.5;
return m_pVideoDisplay->SetVideoPosition(&srcRect, &destRect);
This results in nothing being displayed.
It works in other configurations,
but appears to dislike srcRect.left > srcRect.right
.
Here's approach B(ii), flipping the destination rectangle:
RECT destRect;
GetClientRect(hwnd, &destRect);
LONG left = destRect.left;
destRect.left = destRect.right;
destRect.right = left;
return m_pVideoDisplay->SetVideoPosition(NULL, &destRect);
This also results in nothing being displayed.
It works in other configurations,
but appears to dislike destRect.left > destRect.right
.
IMFVideoProcessorControl::SetMirror
IMFVideoProcessorControl::SetMirror(MF_VIDEO_PROCESSOR_MIRROR)
sounds like what I want.
This IMFVideoProcessorControl
interface is implemented by the Video Processor MFT.
Unfortunately, this is a Media Foundation Transform,
and I can't see how I could use it in DirectShow.
The Video Resizer DSP is "a COM object that can act as a DMO", so theoretically I could use it in DirectShow. Unfortunately, I have no experience with DMOs, and in any case, the docs for the Video Resizer don't say whether it would support flipping the image.
IVMRMixerControl9::SetOutputRect
I found
IVMRMixerControl9::SetOutputRect
,
which explicitly says:
Because this rectangle exists in compositional space, there is no such thing as an "invalid" rectangle. For example, set left greater than right to mirror the video in the x direction.
However, IVMRMixerControl9
appears to be deprecated,
and I'm using an EVR rather than a VMR,
and there are no docs on how to obtain a IVMRMixerControl9
anyway.
I'm reluctant to try this one unless I have to. It will be a major investment, and I'm not sure it will be performant enough anyway.
Media Foundation would possibly allow me to solve this problem, because it provides "Media Foundation Transforms". But it's not even clear that Media Foundation would fit all my other requirements.
I'm very surprised that I am looking at such radical solutions for a transform that seems so standard. What other approaches exist? Is there anything I've overlooked in the approaches I've tried? How can I horizontally mirror video in DirectShow?
Upvotes: 2
Views: 1118
Reputation: 69714
If Option E does not work (see comment above; neither source nor destination rectangle allows mirroring), and given that it's DirectShow I would offer looking into Option F.
However writing a full filter might be not so trivial if you never did this before. There are a few shortcuts here though. You don't need to develop a full filter: similar functionality can be reached at least using two alternate methods:
ISampleGrabberCB::SampleCB
callback. You will find lots of mentions for this technic: when inserted into graph your code can receive a callback for every processed frame. If you rearrange pixels in frame buffer within the callback, the image will be mirrored.Both mentioned will be easier to do because you don't have to use DirectShow BaseClasses, which are notoriously obsolete in 2020.
Both mentioned will not require to understand data flow in DirectShow filter. Both and also developing full DirectShow filter assume that your code supports rearrangement in a limited set of pixel formats. You can go with 24-bit RGB for example, or typical formats of webcams such as NV12 (nowadays).
If your pixel data rearrangement is well done without need to super-optimize the code, you can ignore performance impact - either way it can be neglected in most of the cases.
I expect integration of Media Foundation solution to be more complicated, and much more complicated if Media Foundation solution is to be really well optimized.
The complexity of the problem in first place is the combination of the following factors.
First, you mixed different solutions:
Your approach A is #1 above. However if there is no support for the respected mode, you can't mirror.
Mirroring in EVR renderer #3 is apparently possible in theory. EVR used Direct3D 9 and internally renders a surface (texture) into scene so it's absolutely possible to setup 3D position of the surface in the way that it becomes mirrored. However, the problem here is that API design and coordinate checks are preventing from passing mirroring arguments.
Then Direct3D 9 is pretty much deprecated, and DirectShow itself and even DirectShow/Media Foundation's EVR are in no way compatible to current Direct3D 11. Even though a capability to mirror via hardware might exist, you might have hard time to consume it through the legacy API.
As you want a simple solution you are limited with mirroring as the data is streamed through, #2 that is. Even though this is associated with reasonable performance impact you don't need to rely on specific camera or video hardware support: you just swap the pixels in every frame and that's it.
As I mentioned the easiest way is to setup SampleCB
callback using either 24-bit RGB and/or NV12 pixel format. It depends on whatever else your application is doing too, but with no such information I would say that it is sufficient to implement 24-bit RGB and having the video frame data you would just go row by row and swap the three byte pixel data width/2 times. If the application pipeline allows you might want to have additional code path to flip NV12, which is similar but does not have the video to be converted to RGB in first place and so is a bit more efficient. If NV12 can't work, RGB24 would be a backup code path.
See also: Mirror effect with DirectShow.NET - I seem to already explained something similar 8 years ago.
Upvotes: 1