Ruby Novice
Ruby Novice

Reputation: 101

Most Efficient Way to Write to Fixed Width File (Ruby)

I'm currently working with extremely large fixed width files, sometimes well over a million lines. I have written a method that can write over the files based on a set of parameters, but I think there has to be a more efficient way to accomplish this. The current code I'm using is:

def self.writefiles(file_name, positions, update_value)
@file_name = file_name
@positions = positions.to_i
@update_value = update_value

line_number = 0
@file_contents = File.open(@file_name, 'r').readlines

    while line_number < @file_contents.length
       @read_file_contents = @file_contents[line_number]
       @read_file_contents[@positions] = @update_value
       @file_contents[line_number] = @read_file_contents
       line_number += 1
    end

write_over_file = File.new(@file_name, 'w')
line_number = 0 

    while line_number < @file_contents.length
        write_over_file.write @file_contents[line_number]
        line_number += 1
    end

write_over_file.close
end

For example, if position 25 in the file indicated that it is an original file the value would be set to "O" and if I wanted to replace that value I would use ClassName.writefiles(filename, 140, "X") to change this position on each line. Any help on making this method more efficient would be greatly appreciated!

Thanks

Upvotes: 1

Views: 1933

Answers (3)

Mladen Jablanović
Mladen Jablanović

Reputation: 44080

#!/usr/bin/ruby
# replace_at_pos.rb
pos, char, infile, outfile = $*
pos = pos.to_i
File.open(outfile, 'w') do |f|
  File.foreach(infile) do |line|
    line[pos] = char
    f.puts line
  end
end

and you use it as:

replace_at_pos.rb 140 X inputfile.txt outputfile.txt

For replacing set of values, you can use a hash:

replace = {
  100 => 'a',
  155 => 'c',
  151 => 't'
}
. . .
replace.each do |k, v|
  line[k] = v
end

Upvotes: 0

Shizzmo
Shizzmo

Reputation: 16887

If it's a fixed width file, you can open the file for read/write and use seek to move to the start of the data you want to write, and only write the data you're changing and not the whole line. This would probably be more efficient than rewriting the entire file to replace one field.

Here's a crude example. It reads the last field (10,20,30) increments it by 1, and writes it back:

tha_file (10 characters per line, including newline)

12 3 x 10
23 4 x 20
78 9 x 30

seeker.rb

#!/usr/bin/env ruby
fh=open("tha_file", "r+")

$RECORD_WIDTH=10
$POS=8
$FIELD_WIDTH=2

# seek to first field
fh.seek($POS - 1, IO::SEEK_CUR)

while !fh.eof?

  cur_val=fh.read($FIELD_WIDTH).to_i
  puts "read #{cur_val}"
  fh.seek(-1 * $FIELD_WIDTH, IO::SEEK_CUR)
  cur_val = cur_val + 1

  fh.write(cur_val)
  puts "wrote #{cur_val}"

  # Move to start of next field in the middle of next record
  fh.seek($RECORD_WIDTH - $FIELD_WIDTH, IO::SEEK_CUR)
end

Upvotes: 1

Steve Weet
Steve Weet

Reputation: 28392

You will certainly save some time and quite a lot of memory by reworking the programs to read from the file a line at a time (You are currently reading the whole file into memory). You then write to a backup copy of the file within the loop and then rename the file at the end. Something like this.

  def self.writefiles2(file_name, positions, update_value)
    @file_name = file_name
    @new_file = file_name + ".bak"
    @positions = positions.to_i
    @update_value = update_value

    line_number = 0
    reader = File.open(@file_name, 'r')
    writer = File.open(@new_file, 'w')

    while (line = reader.gets() and not line.nil? )
      line[@positions] = @update_value
      writer.puts(line)
    end
    reader.close
    writer.close
    # Rename the file
  end

This would of course want some error handling around the rename element which could result in the loss of your input data.

Upvotes: 0

Related Questions