Improving the efficiency of pre-caching files on a server

Question

I'm an intern testing a transparent caching solution that we've plunked down on a server in our lab. To do so, I'm pre-caching some files for a test involving new and already-cached content.

However, we have 48 GB of RAM on that machine, and we're using small (16 KB) files for the test, so to keep us actually going back to our TC solution, I'm caching about six million of these files. Yikes.

I've been running the following bash script in the hope of spinning off a bunch of parallel processes so that this pre-caching takes a manageable amount of time, (ignore the dummy IP):

for ((i=0;i<1000;i++)); do for ((j=$((6000*i));j<$((6000*$((i+1))));j++)); do curl x.x.x.x/originDir/test16k_${j}.txt > /dev/null 2>&1 & done; wait; done;

However, I'm still only getting about 1000 files cached every few seconds over our 10 Gbps fiber-optic, which was about what I got when I was doing sequential curls. For six million files, that's going to be a lot of seconds.

Does anyone know a better way to go about this?

Many thanks, RS

walrii · Accepted Answer

One change would be to make use of curl's step counter so that curl does the iterations instead of bash. This should speed it up because you will be avoiding bash interpreter and process start times.

curl x.x.x.x/originDir/test16k_[0-5999].txt

Improving the efficiency of pre-caching files on a server

Answers (1)

Related Questions