Gordon
Gordon

Reputation: 1651

Perl: performance hit with reading multiple files

I was wondering what is better in this case?

I have to read in thousands of files. I was thinking of opening into each file and reading one and closing it. Or cat all the files into one file and read that.

Suggestions? This is all in Perl.

Upvotes: 0

Views: 362

Answers (4)

berekuk
berekuk

Reputation: 186

Note that cat * can fail if number of files is greater than your ulimit -n value. So sequential read can actually be safer. Also, consider using opendir and readdir instead of glob if all your files are located in the same dir.

Upvotes: 1

Daniel Böhmer
Daniel Böhmer

Reputation: 15381

If the time for cating all files into one bigger file doesn't matter it will be faster (only when reading the file sequentially which is the default).

Of course if the process is taken into account it'll be much slower because you have to read, write and read again.

In general reading one file of 1000M should be faster than reading 100 files of 10M because for the 100 files you'll need to look for the metadata.

As tchrist says the performance difference might not be important. I think it depends on the type of file (e.g. for a huge number of files which are very small it would differ much more) and the overall performance of your system and its storage.

Upvotes: 2

Mike Thomsen
Mike Thomsen

Reputation: 37506

Just read the files sequentially. Perl's file i/o functions are pretty thin wrappers around native file i/o calls in the OS, so there isn't much point in fretting about performance from simple file i/o.

Upvotes: 0

tchrist
tchrist

Reputation: 80384

It shouldn't make that much of a difference. This sounds like premature optimization to me.

Upvotes: 6

Related Questions