J_Alaniz
J_Alaniz

Reputation: 98

Write to a file in Ruby with every iteration of a loop

I am dealing with two large files that will not fit in my RAM:

My program is not writing lines as it runs, and it quickly uses my RAM and is killed, with the target file created but still empty.

I tried $stdout.puts, f.puts, |f| f.write and flushing.

This does produce the desired output in small files, but separating my files just does not seem like the way to go.

I have 2 files both the same number of lines and both following the format:

>Line1
Line2

And I need to output them as

@Line 1 from file 1
Line 2 from file 1
+ Line 1 from file 2
Line 2 from file 2

Here is my current code:

#!/usr/bin/ruby

file1 = File.open(ARGV[0])
file2 = File.open(ARGV[1])
outFile = File.open(ARGV[2], 'a')
i = 1
(file1.each_slice(2)).zip((file2.each_slice(2))).each do |f1l, f2l|
  outFile.write (f1l[0].tr(">", "@")+"\n")
  outFile.write (f1l[1]+"\n")
  outFile.write (f2l[0].tr(">", "+") +"\n")
  outFile.write (f2l[1]+"\n")
  if (i % 100) == 0
    GC.start
  end
  i = i+1
end
file1.close
file2.close
outFile.close

Upvotes: 0

Views: 1579

Answers (1)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Let's use IO::write to create two input files.

FNameIn1 = 'in1'
File.write(FNameIn1, "cow\npig\ngoat\nhen\n")
  #=> 17

We can use IO::read to confirm what was written.

puts File.read(FNameIn1)
cow
pig
goat
hen
FNameIn2 = 'in2'
File.write(FNameIn2, "12\n34\n56\n78\n")
  #=> 12 
puts File.read(FNameIn2)
12
34
56
78

Next, use File::open to open the two input files for reading, obtaining a file handle for each.

f1 = File.open(FNameIn1)
  #=> #<File:in1> 
f2 = File.open(FNameIn2)
  #=> #<File:in2>

Now open a file for writing.

FNameOut = 'out'
f = File.open(FNameOut, "w")
  #=> #<File:out>

Assuming the two input files have the same number of lines, in a while loop read the next line from each, combine the two lines in some ways and the write the resulting line to the output file.

until f1.eof
  line11 = f1.gets.chomp
  line12 = f1.gets.chomp
  line21 = f2.gets.chomp
  line22 = f2.gets.chomp
  f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
end

See IO#eof, IO#gets and IO#puts.

Lastly, use IO#close to close the files.

f1.close
f2.close
f.close

Let's see that FileOut looks like.

puts File.read(FNameOut)
cow 12, pig 34
goat 56, hen 78

We can have Ruby close the files by using a block for each File::open:

File.open(FNameIn1) do |f1|
  File.open(FNameIn2) do |f2|
    File.open(FNameOut, "w") do |f|
      until f1.eof
        line11 = f1.gets.chomp
        line12 = f1.gets.chomp
        line21 = f2.gets.chomp
        line22 = f2.gets.chomp
        f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
      end
    end
  end 
end
puts File.read FNameOut
cow 12, pig 34
goat 56, hen 78

This is in fact how it's normally done in Ruby, in part to avoid the possibility of forgetting to close files.

Here's another way, using IO::foreach, which, without a block, returns an enumerator, allowing the use of Enumerable#each_slice, as referenced in the question.

e1 = File.foreach(FNameIn1).each_slice(2)
  #=> #<Enumerator: #<Enumerator: File:foreach("in1")>:each_slice(2)>
e2 = File.foreach(FNameIn2).each_slice(2)
  #=> #<Enumerator: #<Enumerator: File:foreach("in2")>:each_slice(2)> 

File.open(FNameOut, "w") do |f|
  loop do
    line11, line12 = e1.next.map(&:chomp)
    line21, line22 = e2.next.map(&:chomp)
    f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
  end
end
puts File.read(FNameOut)
cow 12, pig 34
goat 56, hen 78

We may observe the values generated by the enumerator

e1 = File.foreach(FNameIn1).each_slice(2)

by repeatedly executing Enumerator#next:

e1.next
  #=> ["cow\n", "pig\n"] 
e1.next
  #=> ["goat\n", "hen\n"] 
e1.next
  #=> StopIteration (iteration reached an end)

The StopIteration exception, when raised, is handled by Kernel#loop by breaking out of the loop (which is one reason why loop is so useful).

Upvotes: 2

Related Questions