stefcud
stefcud

Reputation: 2315

multi cpu core gzip a big file

How can I use all cpu cores in my server(has 4 cores) linux Debian over OpenVZ to gzipping faster one big file?

I am trying to use these commands but I can not put the pieces together

get number of cores CORES=$(grep -c '^processor' /proc/cpuinfo)

this for split big file in more split -b100 file.big

this for use gzip command with multiple core find /source -type f -print0 | xargs -0 -n 1 -P $CORES gzip --best

I don't know if this is the best way for optimize gzip process of big files..

Upvotes: 2

Views: 4695

Answers (2)

Mark Setchell
Mark Setchell

Reputation: 207670

Try GNU Parallel

cat bigfile | parallel --pipe --recend '' -k gzip -9 >bigfile.gz

This will use all your cores to gzip in parallel.

By way of comparison, on my Mac running OSX Mavericks, and using a 6.4GB file on solid state disk, this command

time gzip -9 <bigger >/dev/null

takes 4 minutes 23s and uses 1-2 CPUs at around 50%.

Whereas the GNU Parallel version below

time cat bigger | parallel --pipe --recend '' -k gzip -9 >/dev/null

takes 1 minute 44 seconds and keeps all 8 cores 80+% busy. A very significant difference, with GNU Parallel running in under 40% of the time of the simplistic approach.

Upvotes: 2

Mark Adler
Mark Adler

Reputation: 112422

Use pigz, a parallel gzip implementation.

Unlike parallel with gzip, pigz produces a single gzip stream.

Upvotes: 4

Related Questions