Jakim
Jakim

Reputation: 1813

How to optimize pigz?

I am using pigz to compress a large directory, which is nearly 50GB, I have an ec2 instance, with RedHat, the instance type is m4.xlarge, which has 4 CPUs, I am expecting the compression will eat up all my CPUs and have a better performance. but it didn't meet my expectation.

the command I am using:

tar -cf - lager-dir | pigz > dest.tar.gz

But when the compress is running, I use mpstat -P ALL to check my CPU status, the result shows a lot of %idle for other 3 CPUs, only nearly 2% are used by user space process for each CPU.

Also tried to use top to check that pigz only use less than 10% of the CPU.

Tried with -p 10 to increase the processes count, then it has a high usage for a few minutes, but dropped down when the output file reach to 2.7 GB.

So I have all CPU only used for the compression, I want to fully utilize all of my resources to gain the best performance, how can I get there?

Upvotes: 2

Views: 2562

Answers (1)

Erik S.
Erik S.

Reputation: 53

If file compression apps aren't CPU bound, they are most likely sequential I/O bound.

You can investigate this further by using mpstat to look at the % of time the system is spending in iowait ('wa') using top or mpstat (check manpage for options if it isn't part of the default output).

If I'm right, most of the time the system isn't executing pigz is spent waiting on I/O.

You can also investigate this further using iostat, which can show disk IO. The ratio between reads and writes will vary over time depending on how compressible the input is at that moment, but combined IO should be fairly consistent. This assumes that amazon's storage provisioning provides consistent I/O now, something that didn't used to be the case.

Upvotes: 5

Related Questions