Reputation: 2531
I'm looking for the fastest way to decode a local mpeg-4 video's frames on the iPhone. I'm simply interested in the luminance values of the pixels in every 10th frame. I don't need to render the video anywhere.
I've tried ffmpeg, AVAssetReader, ImageAssetGenerator, OpenCV, and MPMoviePlayer but they're all too slow. The fastest speed I can get is ~2x (2 minutes of video scanned in a minute). I'd like something closer to 10x.
Assuming my attempts above didn't utilize the GPU, is there any way to accomplish my goal with something that does run on the GPU? OpenGL seems like it's mostly for rendering output but I have seen it used as filters for incoming video. Maybe that's an option?
Thanks in advance!
Upvotes: 15
Views: 7486
Reputation: 93458
Assuming the bottleneck of your application is in the code that converts the video frames to a displayable format (like RGB), you might be interested in a code I shared that was used to convert one .mp4 frame (encoded as YV12) to RGB using Qt and OpenGL. This application uploads the frame to the GPU and activates a GLSL fragment shader to do the conversion from YV12 to RGB, so it could be displayed in a QImage
.
static const char *p_s_fragment_shader =
"#extension GL_ARB_texture_rectangle : enable\n"
"uniform sampler2DRect tex;"
"uniform float ImgHeight, chromaHeight_Half, chromaWidth;"
"void main()"
"{"
" vec2 t = gl_TexCoord[0].xy;" // get texcoord from fixed-function pipeline
" float CbY = ImgHeight + floor(t.y / 4.0);"
" float CrY = ImgHeight + chromaHeight_Half + floor(t.y / 4.0);"
" float CbCrX = floor(t.x / 2.0) + chromaWidth * floor(mod(t.y, 2.0));"
" float Cb = texture2DRect(tex, vec2(CbCrX, CbY)).x - .5;"
" float Cr = texture2DRect(tex, vec2(CbCrX, CrY)).x - .5;"
" float y = texture2DRect(tex, t).x;" // redundant texture read optimized away by texture cache
" float r = y + 1.28033 * Cr;"
" float g = y - .21482 * Cb - .38059 * Cr;"
" float b = y + 2.12798 * Cb;"
" gl_FragColor = vec4(r, g, b, 1.0);"
"}"
Upvotes: 0
Reputation: 131461
If you are willing to use an iOS 5 only solution, take a look at the sample app ChromaKey from the 2011 WWDC session on AVCaputureSession.
That demo captures 30 FPS of video from the built-in camera and passes each frame to OpenGL as a texture. It then uses OpenGL to manipulate the frame, and optionally writes the result out to an output video file.
The code uses some serious low-level magic to bind a Core Video Pixel buffer from an AVCaptureSession to OpenGL so they share memory in the graphics hardware.
It should be fairly straightforward to change the AVCaptureSession to use a movie file as input rather than camera input.
You could probably set up the session to deliver frames in Y/UV form rather than RGB, where the Y component is luminance. Failing that, it would be a pretty simple matter to write a shader that would convert RGB values for each pixel to luminance values.
You should be able to do all this on ALL Frames, not just every 10th frame.
Upvotes: 3
Reputation: 8677
Seemingly vImage might be appropriate, assuming you can use iOS 5. Every 10th frame seems to be within reason for using a framework like vImage. However, any type of actual real-time processing is almost certainly going to require OpenGL.
Upvotes: 0