Reputation: 563
Code sample 1:
def count_lines1(file_name)
open(file_name) do |file|
count = 0
while file.gets
count += 1
end
count
end
end
Code sample 2:
def count_lines2(file_name)
file = open(file_name)
count = 0
while file.gets
count += 1
end
count
end
I am wondering which is the better way to implement the counting of lines in a file. In terms of good syntax in Ruby.
Upvotes: 0
Views: 55
Reputation: 160551
which is the better way to implement the counting of lines in a file.
Neither. Ruby can do it easily using foreach
:
def count_lines(file_name)
lines = 0
File.foreach(file_name) { lines += 1 }
lines
end
If I run that against my ~/.bashrc:
$ ruby test.rb
37
foreach
is very fast and will avoid scalability problems.
Alternately, you could take advantage of tools in the OS, such as wc -l
which were written specifically for the task:
`wc -l .bashrc`.to_i
which will return 37 again. If the file is huge, wc
will likely outrun doing it in Ruby because wc
is written in compiled code.
You can also read in large chunks with read and count newline characters.
Yes, read
will allow you to do that, but the scalability issue will remain. In my environment read
or readlines
can be a script killer because we often have to process files well into the tens of GB. There's plenty of RAM to hold the data, but the I/O suffers because of the overhead of slurping the data. "Why is "slurping" a file not a good practice?" goes into this.
An alternate way of reading in big chunks is to tell Ruby to read a set block size, count the line-ends in that block, looping until the file is read completely. I didn't test that method in the above linked answer, but in the past did similar things when I was writing in Perl and found that the difference didn't really improve things because it resulted in a bit more code. At that point, if all I was doing was counting lines, it'd make more sense to call wc -l
and let it do the work as it'd be a lot faster for coding time and most likely in execution time.
Upvotes: 2