Reputation: 98
I am dealing with two large files that will not fit in my RAM:
My program is not writing lines as it runs, and it quickly uses my RAM and is killed, with the target file created but still empty.
I tried $stdout.puts
, f.puts
, |f| f.write
and flushing.
This does produce the desired output in small files, but separating my files just does not seem like the way to go.
I have 2 files both the same number of lines and both following the format:
>Line1
Line2
And I need to output them as
@Line 1 from file 1
Line 2 from file 1
+ Line 1 from file 2
Line 2 from file 2
Here is my current code:
#!/usr/bin/ruby
file1 = File.open(ARGV[0])
file2 = File.open(ARGV[1])
outFile = File.open(ARGV[2], 'a')
i = 1
(file1.each_slice(2)).zip((file2.each_slice(2))).each do |f1l, f2l|
outFile.write (f1l[0].tr(">", "@")+"\n")
outFile.write (f1l[1]+"\n")
outFile.write (f2l[0].tr(">", "+") +"\n")
outFile.write (f2l[1]+"\n")
if (i % 100) == 0
GC.start
end
i = i+1
end
file1.close
file2.close
outFile.close
Upvotes: 0
Views: 1579
Reputation: 110675
Let's use IO::write to create two input files.
FNameIn1 = 'in1'
File.write(FNameIn1, "cow\npig\ngoat\nhen\n")
#=> 17
We can use IO::read to confirm what was written.
puts File.read(FNameIn1)
cow
pig
goat
hen
FNameIn2 = 'in2'
File.write(FNameIn2, "12\n34\n56\n78\n")
#=> 12
puts File.read(FNameIn2)
12
34
56
78
Next, use File::open to open the two input files for reading, obtaining a file handle for each.
f1 = File.open(FNameIn1)
#=> #<File:in1>
f2 = File.open(FNameIn2)
#=> #<File:in2>
Now open a file for writing.
FNameOut = 'out'
f = File.open(FNameOut, "w")
#=> #<File:out>
Assuming the two input files have the same number of lines, in a while
loop read the next line from each, combine the two lines in some ways and the write the resulting line to the output file.
until f1.eof
line11 = f1.gets.chomp
line12 = f1.gets.chomp
line21 = f2.gets.chomp
line22 = f2.gets.chomp
f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
end
See IO#eof, IO#gets and IO#puts.
Lastly, use IO#close to close the files.
f1.close
f2.close
f.close
Let's see that FileOut
looks like.
puts File.read(FNameOut)
cow 12, pig 34
goat 56, hen 78
We can have Ruby close the files by using a block for each File::open
:
File.open(FNameIn1) do |f1|
File.open(FNameIn2) do |f2|
File.open(FNameOut, "w") do |f|
until f1.eof
line11 = f1.gets.chomp
line12 = f1.gets.chomp
line21 = f2.gets.chomp
line22 = f2.gets.chomp
f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
end
end
end
end
puts File.read FNameOut
cow 12, pig 34
goat 56, hen 78
This is in fact how it's normally done in Ruby, in part to avoid the possibility of forgetting to close files.
Here's another way, using IO::foreach, which, without a block, returns an enumerator, allowing the use of Enumerable#each_slice, as referenced in the question.
e1 = File.foreach(FNameIn1).each_slice(2)
#=> #<Enumerator: #<Enumerator: File:foreach("in1")>:each_slice(2)>
e2 = File.foreach(FNameIn2).each_slice(2)
#=> #<Enumerator: #<Enumerator: File:foreach("in2")>:each_slice(2)>
File.open(FNameOut, "w") do |f|
loop do
line11, line12 = e1.next.map(&:chomp)
line21, line22 = e2.next.map(&:chomp)
f.puts "%s %s, %s %s" % [line11, line21, line12, line22]
end
end
puts File.read(FNameOut)
cow 12, pig 34
goat 56, hen 78
We may observe the values generated by the enumerator
e1 = File.foreach(FNameIn1).each_slice(2)
by repeatedly executing Enumerator#next:
e1.next
#=> ["cow\n", "pig\n"]
e1.next
#=> ["goat\n", "hen\n"]
e1.next
#=> StopIteration (iteration reached an end)
The StopIteration
exception, when raised, is handled by Kernel#loop by breaking out of the loop (which is one reason why loop
is so useful).
Upvotes: 2