Reputation: 101
I'm currently working with extremely large fixed width files, sometimes well over a million lines. I have written a method that can write over the files based on a set of parameters, but I think there has to be a more efficient way to accomplish this. The current code I'm using is:
def self.writefiles(file_name, positions, update_value)
@file_name = file_name
@positions = positions.to_i
@update_value = update_value
line_number = 0
@file_contents = File.open(@file_name, 'r').readlines
while line_number < @file_contents.length
@read_file_contents = @file_contents[line_number]
@read_file_contents[@positions] = @update_value
@file_contents[line_number] = @read_file_contents
line_number += 1
end
write_over_file = File.new(@file_name, 'w')
line_number = 0
while line_number < @file_contents.length
write_over_file.write @file_contents[line_number]
line_number += 1
end
write_over_file.close
end
For example, if position 25 in the file indicated that it is an original file the value would be set to "O" and if I wanted to replace that value I would use ClassName.writefiles(filename, 140, "X") to change this position on each line. Any help on making this method more efficient would be greatly appreciated!
Thanks
Upvotes: 1
Views: 1933
Reputation: 44080
#!/usr/bin/ruby
# replace_at_pos.rb
pos, char, infile, outfile = $*
pos = pos.to_i
File.open(outfile, 'w') do |f|
File.foreach(infile) do |line|
line[pos] = char
f.puts line
end
end
and you use it as:
replace_at_pos.rb 140 X inputfile.txt outputfile.txt
For replacing set of values, you can use a hash:
replace = {
100 => 'a',
155 => 'c',
151 => 't'
}
. . .
replace.each do |k, v|
line[k] = v
end
Upvotes: 0
Reputation: 16887
If it's a fixed width file, you can open the file for read/write and use seek to move to the start of the data you want to write, and only write the data you're changing and not the whole line. This would probably be more efficient than rewriting the entire file to replace one field.
Here's a crude example. It reads the last field (10,20,30) increments it by 1, and writes it back:
tha_file (10 characters per line, including newline)
12 3 x 10
23 4 x 20
78 9 x 30
seeker.rb
#!/usr/bin/env ruby
fh=open("tha_file", "r+")
$RECORD_WIDTH=10
$POS=8
$FIELD_WIDTH=2
# seek to first field
fh.seek($POS - 1, IO::SEEK_CUR)
while !fh.eof?
cur_val=fh.read($FIELD_WIDTH).to_i
puts "read #{cur_val}"
fh.seek(-1 * $FIELD_WIDTH, IO::SEEK_CUR)
cur_val = cur_val + 1
fh.write(cur_val)
puts "wrote #{cur_val}"
# Move to start of next field in the middle of next record
fh.seek($RECORD_WIDTH - $FIELD_WIDTH, IO::SEEK_CUR)
end
Upvotes: 1
Reputation: 28392
You will certainly save some time and quite a lot of memory by reworking the programs to read from the file a line at a time (You are currently reading the whole file into memory). You then write to a backup copy of the file within the loop and then rename the file at the end. Something like this.
def self.writefiles2(file_name, positions, update_value)
@file_name = file_name
@new_file = file_name + ".bak"
@positions = positions.to_i
@update_value = update_value
line_number = 0
reader = File.open(@file_name, 'r')
writer = File.open(@new_file, 'w')
while (line = reader.gets() and not line.nil? )
line[@positions] = @update_value
writer.puts(line)
end
reader.close
writer.close
# Rename the file
end
This would of course want some error handling around the rename element which could result in the loss of your input data.
Upvotes: 0