Jes
Jes

Reputation: 343

How to create CSV from a text file in Ruby

I need to create a CSV file from a textfile with billing data about my calls. My textfile has a structure like:

01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00

I create the CSV using:

require 'csv'
  @calls = File.new("modified_billing", "r")
  CSV.open("new.csv", 'wb', write_headers: true,
    headers: ["Date", "Time", "Phone number","City","Duration", "Cost", "Cost of call"]) do |csv|
    @calls.each do |call|
      csv << call.split(" ")
    end
  end

It works for cities which have singular name, but obviously it doesn't for "New York", "Las Vegas", etc. because it creates two columns out of them.

Upvotes: 3

Views: 2227

Answers (3)

the Tin Man
the Tin Man

Reputation: 160553

There are a bunch'a ways to solve this. Here's how I've done similar things in the past:

str = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
/(\S+) (\S+) (\d+) (.+) (\S+) (\S+) (\S+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "New York", "3:35", "0,00", "0,00"]

str = '01.02.2016 10:35:49 8998775 Chicago 3:35 0,00 0,00'
/(\S+) (\S+) (\d+) (.+) (\S+) (\S+) (\S+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "Chicago", "3:35", "0,00", "0,00"]

Since you know the fields are usually space delimited you can use those to your advantage.

If you need to validate the fields a bit more:

str = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
/([\d.]+) ([\d:]+) (\d+) (\D+?) ([\d+:]+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "New York", "3:35", "0,00", "0,00"]

str = '01.02.2016 10:35:49 8998775 Chicago 3:35 0,00 0,00'
/([\d.]+) ([\d:]+) (\d+) (\D+?) ([\d+:]+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "Chicago", "3:35", "0,00", "0,00"]

Sometimes I've used something more like a template:

str = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
/((?:\d{2}\.){2}\d{4}) ((?:\d{2}:){2}\d{2}) (\d+) (\D+?) (\d+:\d+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "New York", "3:35", "0,00", "0,00"]

str = '01.02.2016 10:35:49 8998775 Chicago 3:35 0,00 0,00'
/((?:\d{2}\.){2}\d{4}) ((?:\d{2}:){2}\d{2}) (\d+) (\D+?) (\d+:\d+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "Chicago", "3:35", "0,00", "0,00"]

Where:

  • \d{2} means "two digits".
  • (?:\d{2}\.) means "consider two digits and a . as a group but don't capture ("remember") it.
  • (?:\d{2}\.){2} means "do it twice".
  • ((?:\d{2}\.){2}\d{4}) means "remember all that plus the next four digits".

Knowing that you can work out the rest of the pattern.

The advantage is once the patterns are figured out it's a matter of figuring out how many times they repeat. If the source text changes later then it's pretty simple to tweak the numbers. And this is the power of regular expressions, they're great when you have patterns that repeat.

All that said, I prefer to not use regular expressions because they tend to be fragile and, if you're not aware of how the engine parses, they can really slow down your code. Instead I'd go with something like Damien's, using split to take apart the string, then shift and pop resulting in the city being left.

Upvotes: 4

user419017
user419017

Reputation:

I think you've almost got it. Here's a simple way of doing it without regex:

string = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
data = string.split(' ')

data.shift(3)
# => ["01.02.2016", "10:35:49", "8998775"]

data.pop(3)
# => ["3:35", "0,00", "0,00"]

data.join(' ')
# => "New York"

# putting it together
first, third, second = data.shift(3), data.pop(3), [data.join(' ')]
csv << first + second + third

Something more compact, though a little harder to read:

data = call.split(' ')
csv << [data.shift(3), data.pop(3)].insert(1, data.join(' ')).flatten

Upvotes: 6

Eric Duminil
Eric Duminil

Reputation: 54213

Here's a Regexp that matches your example. Without other lines, it's hard to tell if it will work for every call. You'll get a "Cannot parse" warning for calls that don't get matched by the Regexp. If there are multiples spaces or tabs, you can replace all the ' ' by '\s+'.

if call=~/(\d\d\.\d\d\.\d\d\d\d) (\d\d:\d\d:\d\d) (\d+) (.*?) (\d+:\d\d) (\d+,\d\d) (\d+,\d\d)/ then
  csv << Regexp.last_match.captures
else
  puts "Cannot parse : #{call}"
end

Upvotes: 0

Related Questions