Reputation: 51
I would like to just upscale the keyframes in an H.264 video. I've been trying to see where can I hold the key frame, in the C src code, in order to scale it. I'm confused whether to apply the scaling in the parsing packet part, or in the decoding part.
I also don't know if upscaling the B & P frames would be enough to not damage the video frames.
I hope you guide me as there are no sufficient documentation about FFmpeg. If you have any suggestions please let me know.
Upvotes: 1
Views: 604
Reputation: 15926
"The B & P frames will be added to the motion compensation to reconstruct in the video decoding, while the keyframes just need to do intra prediction. So I just want to upscale the keyframes using the neural network. After that, the B & P frames can also enhance the visual quality when reconstructed using the motion compensation"
Your idea is good but it won't work with H.264. The codec is too obsessed with compression (doing crazy things like a single video frame can be made of some parts from last IDR-frame and some parts from P-frames, basically mixing any past and future frame's macroblocks to produce the current video frame). Keyframes can be two types: IDR (full image) and I-frame (slices can be mixed from IDR or P/B-frames).
To avoid the above problems, you need to create your own custom system that recreates or emulates the H.264 macroblocks and motion vectors and applies them to the pixels of the upscaled keyframes. It's all easier to do everything once you work at macroblock level (to detect motion direction etc).
FFmpeg won't let you do custom keyframe replacements but you can code that easily,
(after practicing in a hex editor)...
For a quick test (to replace an IDR keyframe):
For practice, try using small images (16x16 or 32x32) with short durations like 5 seconds or less...
.h264
format using FFmpeg.example command:
ffmpeg -i upscaled_keyframe.png -c:v libx264 -profile:v baseline -level:v 3.0 -pix_fmt yuv420p out_upscaled_key.h264
(2) Use a Hex Editor to replace the first keyframe's bytes in the main video with the upscaled keyframe bytes. You'll find the first keyframe by its start code 00 00 01 65
and the next 00 00 01 XX
is its ending (or is starting of a new frame, where XX might be a 41
or such).
(3) Ideally you want to use baseline profile since it creates no b-frames. B-frame will mess up your idea (ie: encoder uses a future unscaled b-frame to create parts within frame of upscaled image. This means the main video should be encoded as baseline and also same for the new keyframe. You cannot mix profile types.
(4) Save edits in hex editor and then test in a media player...
(or in FFmpeg just use -codec copy
to first convert the H.264 into an MP4 file).
(5) Write code to create an "output" bytes array. In output you copy the bytes of the main video from its start until the beginning of its first keyframe, then you add your upscaled keyframe bytes, then you copy the bytes of main video's next frame until ending bytes.
Your "output" file structure should be like below:
output H.264 file = [main-vid's SPS and PPS (and possibly SEI)] --> [ upscaled keyframe as first frame ] --> [ main-vid's frame 2 until ending... ]
Upvotes: 0