Reputation: 5203
I have a perl script where I am writing out a very large log file. Currently I write out my file in the 'traditional' Perl way of doing it:
open FILE, ">", 'log.txt';
print FILE $line;
.....
close FILE;
I've heard a lot of good things about File::Slurp when reading in files, and how it can vastly improve runtimes. My question is, would using File::Slurp make writing out my log file any faster? I ask because writing out a file in perl seems pretty simple as it is, I don't know how File::Slurp could really optimize it anymore.
Upvotes: 5
Views: 3943
Reputation: 126722
The File::Slurp
utilities may, under certain circumstances, be fractionally faster overall than the equivalent streamed implementation, but file I/O is so very much slower than anything based solely on memory and CPU speed that it is almost always the limiting resource.
I have never heard any claims that File::Slurp
can vastly improve runtimes and would appreciate seeing a reference to that effect. The only way I could see it being a more efficient solution is if the program requires random access to the files or has to read it multiple times. Because the data is all in memory at once there is no overhead to accessing any of the data, but in this case my preference would be for Tie::File
which makes it appear as if the data is all available simultaneously with little speed impact and far less memory overhead
In fact it may well be that a call to read_file
makes the process seem much slower to the user. If the file is significantly large then the time taken to read all of it and split it into lines may amount to a distinct delay before processing can start, whereas openeing a file and reading the first line will usually appear to be instantaneous
The same applies at the end of the program. A call to write_file
, which combines the data into disk blocks and pages it out to the file, will take substantially longer than simply closing the file
In general the traditional streaming output method is preferable. It has little or no speed impact and avoids data loss by saving the data incrementally instead of waiting until a vast swathe of data has been accumulated in memory before discovering that it cannot be written to disk for one reason or another
My advice is that you reserve using File::Slurp
for when you have small files to which random access could significantly simplify the program code. Even then there is nothing wrong with
my @data = do {
open my $fh, '<', 'my_file' or die $!;
<$fh>;
};
for input, or
open my $fh, '>', 'out_file' or die $!;
print { $fh } for @data;
for output. Particularly in your case, where you are dealing with a very large log file I think there is no question that you should stick to streamed output methods
Upvotes: 9
Reputation:
File::Slurp
is mostly a convenience function. Instead of writing the usual open
, while read/write
, close
code you only have the one lines read_file
and write_file
.
However, I don't know about it being any faster than your own code. It is coded in Perl, not in C. Also in case of using the array variant of write_file $file_name, @lines
it might also be a bit inefficient regarding memory as it first joins all array lines into a single scalar before writing that out.
However, it does use syswrite
instead of buffered writes. It can safely do that because it is the only function accessing the file handle during its life time. So yes, it might be faster due to that.
Upvotes: 8