Write to a file in Ruby with every iteration of a loop

Question

I am dealing with two large files that will not fit in my RAM:

I wish to load two lines of each, process them and write them.
I need it to be two lines at a time due to the nature of these files.

My program is not writing lines as it runs, and it quickly uses my RAM and is killed, with the target file created but still empty.

I tried $stdout.puts, f.puts, |f| f.write and flushing.

This does produce the desired output in small files, but separating my files just does not seem like the way to go.

I have 2 files both the same number of lines and both following the format:

>Line1
Line2

And I need to output them as

@Line 1 from file 1
Line 2 from file 1
+ Line 1 from file 2
Line 2 from file 2

Here is my current code:

#!/usr/bin/ruby

file1 = File.open(ARGV[0])
file2 = File.open(ARGV[1])
outFile = File.open(ARGV[2], 'a')
i = 1
(file1.each_slice(2)).zip((file2.each_slice(2))).each do |f1l, f2l|
  outFile.write (f1l[0].tr(">", "@")+"
")
  outFile.write (f1l[1]+"
")
  outFile.write (f2l[0].tr(">", "+") +"
")
  outFile.write (f2l[1]+"
")
  if (i % 100) == 0
    GC.start
  end
  i = i+1
end
file1.close
file2.close
outFile.close

Cary Swoveland · Accepted Answer

Let's use IO::write to create two input files.

FNameIn1 = 'in1'
File.write(FNameIn1, "cow
pig
goat
hen
")
  #=> 17

We can use IO::read to confirm what was written.

puts File.read(FNameIn1)
cow
pig
goat
hen

FNameIn2 = 'in2'
File.write(FNameIn2, "12
34
56
78
")
  #=> 12 
puts File.read(FNameIn2)
12
34
56
78

Next, use File::open to open the two input files for reading, obtaining a file handle for each.

f1 = File.open(FNameIn1)
  #=> # 
f2 = File.open(FNameIn2)
  #=> #

Now open a file for writing.

FNameOut = 'out'
f = File.open(FNameOut, "w")
  #=> #

Assuming the two input files have the same number of lines, in a while loop read the next line from each, combine the two lines in some ways and the write the resulting line to the output file.

until f1.eof
  line11 = f1.gets.chomp
  line12 = f1.gets.chomp
  line21 = f2.gets.chomp
  line22 = f2.gets.chomp
  f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
end

See IO#eof, IO#gets and IO#puts.

Lastly, use IO#close to close the files.

f1.close
f2.close
f.close

Let's see that FileOut looks like.

puts File.read(FNameOut)
cow 12, pig 34
goat 56, hen 78

We can have Ruby close the files by using a block for each File::open:

File.open(FNameIn1) do |f1|
  File.open(FNameIn2) do |f2|
    File.open(FNameOut, "w") do |f|
      until f1.eof
        line11 = f1.gets.chomp
        line12 = f1.gets.chomp
        line21 = f2.gets.chomp
        line22 = f2.gets.chomp
        f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
      end
    end
  end 
end

puts File.read FNameOut
cow 12, pig 34
goat 56, hen 78

This is in fact how it's normally done in Ruby, in part to avoid the possibility of forgetting to close files.

Here's another way, using IO::foreach, which, without a block, returns an enumerator, allowing the use of Enumerable#each_slice, as referenced in the question.

e1 = File.foreach(FNameIn1).each_slice(2)
  #=> #:each_slice(2)>
e2 = File.foreach(FNameIn2).each_slice(2)
  #=> #:each_slice(2)> 

File.open(FNameOut, "w") do |f|
  loop do
    line11, line12 = e1.next.map(&:chomp)
    line21, line22 = e2.next.map(&:chomp)
    f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
  end
end

puts File.read(FNameOut)
cow 12, pig 34
goat 56, hen 78

We may observe the values generated by the enumerator

e1 = File.foreach(FNameIn1).each_slice(2)

by repeatedly executing Enumerator#next:

e1.next
  #=> ["cow
", "pig
"] 
e1.next
  #=> ["goat
", "hen
"] 
e1.next
  #=> StopIteration (iteration reached an end)

The StopIteration exception, when raised, is handled by Kernel#loop by breaking out of the loop (which is one reason why loop is so useful).

Write to a file in Ruby with every iteration of a loop

Answers (1)

Related Questions