Reputation: 343
I need to create a CSV file from a textfile with billing data about my calls. My textfile has a structure like:
01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00
I create the CSV using:
require 'csv'
@calls = File.new("modified_billing", "r")
CSV.open("new.csv", 'wb', write_headers: true,
headers: ["Date", "Time", "Phone number","City","Duration", "Cost", "Cost of call"]) do |csv|
@calls.each do |call|
csv << call.split(" ")
end
end
It works for cities which have singular name, but obviously it doesn't for "New York", "Las Vegas", etc. because it creates two columns out of them.
Upvotes: 3
Views: 2227
Reputation: 160553
There are a bunch'a ways to solve this. Here's how I've done similar things in the past:
str = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
/(\S+) (\S+) (\d+) (.+) (\S+) (\S+) (\S+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "New York", "3:35", "0,00", "0,00"]
str = '01.02.2016 10:35:49 8998775 Chicago 3:35 0,00 0,00'
/(\S+) (\S+) (\d+) (.+) (\S+) (\S+) (\S+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "Chicago", "3:35", "0,00", "0,00"]
Since you know the fields are usually space delimited you can use those to your advantage.
If you need to validate the fields a bit more:
str = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
/([\d.]+) ([\d:]+) (\d+) (\D+?) ([\d+:]+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "New York", "3:35", "0,00", "0,00"]
str = '01.02.2016 10:35:49 8998775 Chicago 3:35 0,00 0,00'
/([\d.]+) ([\d:]+) (\d+) (\D+?) ([\d+:]+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "Chicago", "3:35", "0,00", "0,00"]
Sometimes I've used something more like a template:
str = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
/((?:\d{2}\.){2}\d{4}) ((?:\d{2}:){2}\d{2}) (\d+) (\D+?) (\d+:\d+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "New York", "3:35", "0,00", "0,00"]
str = '01.02.2016 10:35:49 8998775 Chicago 3:35 0,00 0,00'
/((?:\d{2}\.){2}\d{4}) ((?:\d{2}:){2}\d{2}) (\d+) (\D+?) (\d+:\d+) ([\d,]+) ([\d,]+)/.match(str).captures
# => ["01.02.2016", "10:35:49", "8998775", "Chicago", "3:35", "0,00", "0,00"]
Where:
\d{2}
means "two digits".(?:\d{2}\.)
means "consider two digits and a .
as a group but don't capture ("remember") it.(?:\d{2}\.){2}
means "do it twice".((?:\d{2}\.){2}\d{4})
means "remember all that plus the next four digits".Knowing that you can work out the rest of the pattern.
The advantage is once the patterns are figured out it's a matter of figuring out how many times they repeat. If the source text changes later then it's pretty simple to tweak the numbers. And this is the power of regular expressions, they're great when you have patterns that repeat.
All that said, I prefer to not use regular expressions because they tend to be fragile and, if you're not aware of how the engine parses, they can really slow down your code. Instead I'd go with something like Damien's, using split
to take apart the string, then shift
and pop
resulting in the city being left.
Upvotes: 4
Reputation:
I think you've almost got it. Here's a simple way of doing it without regex:
string = '01.02.2016 10:35:49 8998775 New York 3:35 0,00 0,00'
data = string.split(' ')
data.shift(3)
# => ["01.02.2016", "10:35:49", "8998775"]
data.pop(3)
# => ["3:35", "0,00", "0,00"]
data.join(' ')
# => "New York"
# putting it together
first, third, second = data.shift(3), data.pop(3), [data.join(' ')]
csv << first + second + third
Something more compact, though a little harder to read:
data = call.split(' ')
csv << [data.shift(3), data.pop(3)].insert(1, data.join(' ')).flatten
Upvotes: 6
Reputation: 54213
Here's a Regexp that matches your example. Without other lines, it's hard to tell if it will work for every call. You'll get a "Cannot parse" warning for calls that don't get matched by the Regexp. If there are multiples spaces or tabs, you can replace all the ' ' by '\s+'.
if call=~/(\d\d\.\d\d\.\d\d\d\d) (\d\d:\d\d:\d\d) (\d+) (.*?) (\d+:\d\d) (\d+,\d\d) (\d+,\d\d)/ then
csv << Regexp.last_match.captures
else
puts "Cannot parse : #{call}"
end
Upvotes: 0