rubyUser
rubyUser

Reputation: 5

Replace a string in csv using regex using ruby

I have a csv files which has a columns test and id and values are :

"abc is 123 test", 1

"abc is 123 test", 2

"abc is 123 test", 3

"abc is 123 test", 4

"abc is 123 test", 5

I want to replace the "abc is 123 test" with "abc is 567 test".

Note : Values 123 and 567 are dynamic values and with every new csv 123 gets changed, but string "abc is <value> test" always remain same.

Code i tried :

folder_path = "/home/test/files/"
f1 = folder_path + "abc.csv"
string_replace = "abc is 567 test"

file = IO.read(/home/test/files/abc.csv")
file_final = expected_file.gsub!("abc is".*, string_replace)
File.open(f1, 'w') { |f| f.write(file_final) }

I am getting the error:

"ArgumentError: wrong number of arguments calling * (0 for 1)

Can anyone help ?

Upvotes: 0

Views: 1182

Answers (1)

the Tin Man
the Tin Man

Reputation: 160551

While technically the files are CSV, we can treat CSV files as text, since that's what they are. That makes it much easier to munge them when they're simple.

I'd start with:

File.open('csv.new', 'w') do |fo|
  DATA.each_line do |li|
    fo.puts li.sub('123', '456')
  end
end

__END__
"abc is 123 test", 1
"abc is 123 test", 2
"abc is 123 test", 3
"abc is 123 test", 4
"abc is 123 test", 5

Running it generates a file called "csv.new" which contains:

"abc is 456 test", 1
"abc is 456 test", 2
"abc is 456 test", 3
"abc is 456 test", 4
"abc is 456 test", 5

Instead of:

DATA.each_line do |li|

you'd want to open your original file using:

File.foreach("/home/test/files/abc.csv") do |li|

(DATA and __END__ are a way to access sample data stored at the end of a Ruby script.)

'123' is prone to false-positive hits, and would change sub-strings:

'0123456'.sub('123', '456') # => "0456456"

to counter that, if there is any chance of sub-string matches you'd want to use a more intelligent search string; I'd use a regular expression:

'0123456'.sub(/\b123\b/, '456') # => "0123456"

which now checks to see if there's a word boundary surrounding 123:

'0 123 456'.sub(/\b123\b/, '456') # => "0 456 456"

Since "123" could change, it'd make sense to assign it to a constant then substitute that into the pattern:

TARGET_STR = '123'

'0123456'.sub(/\b#{TARGET_STR}\b/, '456') # => "0123456"
'0 123 456'.sub(/\b#{TARGET_STR}\b/, '456') # => "0 456 456"

Because I'm using blocks with open and foreach, Ruby will automatically close the files once the blocks end, resulting in cleaner code, and better management of file handles.

Your code:

file = IO.read(/home/test/files/abc.csv")
file_final = expected_file.gsub!("abc is".*, string_replace)
File.open(f1, 'w') { |f| f.write(file_final) }

... is a ... mess.

  • read is great for files you know will always be below 1MB in size. If you don't know that, especially if you're working in a production environment where files can be well into the GB range, using line-by-line IO is faster and safer as it sidesteps scalability issues. See "Why is "slurping" a file not a good practice?" for more information.
  • We don't know what expected_file is, but it'll cause an error because it's undefined so Ruby would revolt because you used the gsub! method on a nil value.
  • If expected_file is a String, expected_file.gsub! would mutate expected_file, but assigning the result to file_final wastes CPU. Instead reuse expected_file, or, better, use:

    file_final = expected_file.gsub(
    
  • "abc is".* is an invalid parameter. Possibly "abc is.*" would be closer, but it appears you're reaching for a regular expression /abc is.*/, but that wouldn't be necessary to change the string, /123/ or '123' would be sufficient.

  • gsub would be overkill here too, since you only need a single replacement, so sub would be faster.
  • Technically,

    File.open(f1, 'w') { |f| f.write(file_final) }
    

    will work, but it's much more easily written as

    File.write(f1, file_final)
    

You could reduce the code to:

File.write(
  'file.csv.new',
  File.read('file.csv').gsub(/\b123\b/, '456')
)

which, out of perverseness, could be written as:

File.write('file.csv.new', File.read('file.csv').gsub(/\b123\b/, '456'))

There'd be no improvement in speed, and instead it'd reduce readability.

Upvotes: 1

Related Questions