Davis Dimitriov
Davis Dimitriov

Reputation: 4257

Fastest Way to Parse a Large File in Ruby

I have a simple text file that is ~150mb. My code will read each line, and if it matches certain regexes, it gets written to an output file. But right now, it just takes a long time to iterate through all of the lines of the file (several minutes) doing it like

File.open(filename).each do |line|
  # do some stuff
end

I know that it is the looping through the lines of the file that is taking a while because even if I do nothing with the data in "#do some stuff", it still takes a long time.

I know that some unix programs can parse large files like this almost instantly (like grep), so I am wondering why ruby (MRI 1.9) takes so long to read the file, and is there some way to make it faster?

Upvotes: 7

Views: 7724

Answers (3)

steenslag
steenslag

Reputation: 80065

File.readlines.each do |line|
  #do stuff with each line
end

Will read the whole file into one array of lines. It should be a lot faster, but it takes more memory.

Upvotes: 2

Zepplock
Zepplock

Reputation: 29135

You should read it into the memory and then parse. Of course it depends on what you are looking for. Don't expect miracle performance from ruby, especially comparing to c/c++ programs which are being optimized for past 30 years ;-)

Upvotes: -2

tadman
tadman

Reputation: 211580

It's not really fair to compare to grep because that is a highly tuned utility that only scans the data, it doesn't store any of it. When you're reading that file using Ruby you end up allocating memory for each line, then releasing it during the garbage collection cycle. grep is a pretty lean and mean regexp processing machine.

You may find that you can achieve the speed you want by using an external program like grep called using system or through the pipe facility:

`grep ABC bigfile`.split(/\n/).each do |line|
  # ... (called on each matching line) ...
end

Upvotes: 5

Related Questions