Kirill K
Kirill K

Reputation: 405

Slow VP8 and VP9 encoding with ffmpeg

I saw this answer, but it's a little old. Maybe the situation has changed?

I want to re-encode a stream from an IP camera to WebM (VP8 or VP9) format with ffmpeg. I need real time speed, but my CPU is a Core i5 (2017) and too busy (load avarage too more 100%).

At the moment I'm using this command (with overlay chroma key):

./ffmpeg \
-i \
bg.jpg \
-thread_queue_size 512 \
-rtsp_transport tcp -i rtsp://ip_cam:port/stream \
-codec:v libvpx -quality realtime -r 25 -crf 30 \
-b:v 2M -qmin 10 -qmax 50 -maxrate 2.5M -bufsize 5M \
-speed 1 \
-b:v 2M \
-cpu-used 0 -threads 4 \
-auto-alt-ref 0 \
-c:a libopus -b:a 96k \
-filter_complex "[1:v]chromakey=0x70de77:0.1:0.0[ckout];[0:v][ckout]overlay[out]" \
-map "[out]" \
-f webm udp://ip_destination:1935/name/stream

Upvotes: 4

Views: 18037

Answers (2)

Tom B
Tom B

Reputation: 2923

Switch to vp9_vaapi if it is available Using libvpx-vp9 I was getting 3-5fps at 1080p which is painfully slow if you're trying to convert an hour of video.

If your GPU supports it, using vp9_vaapi can be much, much faster. On my HTPC with an i7 8650u vaapi gives about 30x better performance, I can encode 4 videos at once at 130-150fps each.

Sample ffmpeg line:

 ffmpeg -vaapi_device /dev/dri/renderD128 -i $infile -vf 'format=nv12,hwupload' -c:v vp9_vaapi -b:v 0  -c:a libvorbis $outfile

There is an option loop_filter_level seems to be equivalent to CRF and goes from 0-63. However, it is poorly documented online other than the default is 16. I tried it at 1 and 63, the file size and subjective quality were practically identical so either I'm using it wrong or the option is ignored by ffmpeg.

Using default settings I could not see any visual difference between my 1080p h264 source video and the vp9 output.

You'll need to check your GPU supports hardware encoding. Run vainfo and look for:

  VAProfileVP9Profile0            : VAEntrypointEncSlice

vp9_vaapi vs libvpx-vp9

I tried encoding the same 50 minute 1080p video with these results:

  • libvpx-vp9 took nearly 8 hours and produced a 568.8mb file
  • vp9_vaapi -loop_filter_level 1 took just over 7 minutes and produced a 756.1mb file
  • vp9_vaapi -loop_filter_level 63 tool just over 8 minutes and produced a 734.1mb file

Subjectively all the videos look the same to me and I could not tell one from the other.

Clearly, libvpx-vp9 wins on compression but unless you're very, very starved for disk space (or bandwidth if you're planning to stream the video), it is absolutely not worth the unreasonable encoding time.

I don't know why loop_filer_level makes such a little difference, I would suggest leaving it at the default (16) until it is better documented.

All the usual caveats apply. libvpx will no doubt mature over time, your hardware may produce different results, and hardware encoders often give worse visual quality than software ones (though I could not tell in my test).

Upvotes: 11

slhck
slhck

Reputation: 38672

The speed/quality options for VP8/VP9 are explained in the documentation. Note that in ffmpeg, you have to specify the parameters differently (see ffmpeg -h encoder=libvpx-vp9):

  • CPU Usage:
    • ffmpeg: -cpu-used (legacy option: -speed)
    • libvpx: --cpu-used
  • Quality / Deadline:
    • ffmpeg: -deadline realtime, -deadline good (legacy option: -quality)
    • libvpx: --rt, --good

The -cpu-used should be your main control knob. While the default is 0, the documentation says that:

Setting --cpu-used=1 or --cpu-used=2 will give further significant boosts to encode speed, but will start to have a more noticeable impact on quality and may also start to effect the accuracy of the data rate control.

Setting a value of 4 or 5 will turn off "rate distortion optimisation" which has a big impact on quality, but also greatly speeds up the encoder.

For live encoding particularly, you want to set -deadline realtime:

--rt Real-time mode allows the encoder to auto adjust the speed vs. quality trade-off in order to try and hit a particular cpu utilisation target. In this mode the --cpu-used parameter controls the %cpu target as follows:

target cpu utilisation = (100*(16-cpu-used)/16)%

Legal values for -cpu-used when combined with --rt mode are (0-15).

It is worth noting that in --rt mode the encode quality will depend on how hard a particular clip or section of a clip is and how fast the encoding machine is. In this mode the results will thus vary from machine to machine and even from run to run depending on what else you are doing.

But of course, with an i5 CPU, depending on how many parallel transcoding tasks you have and what level of quality you want to reach, and what the final latency should be, investing into a beefy CPU from the latest Intel i7 series would make sense.

Intel's Kaby Lake chips apparently support hardware-assisted encoding through Intel QuickSync, and ffmpeg supports that through VA-API.

Upvotes: 11

Related Questions