Reputation: 727
I just received a file from a colleague and don't know how to parse this:
Input:
key,value1,"value2,hello"
Desired output:
key,value2
Perl or Python are the languages I understand.
Thanks, Bernardo
Upvotes: 1
Views: 449
Reputation: 69264
The standard Perl module Text::ParseWords can be used to handle CSV files.
#!/usr/bin/perl
use strict;
use warnings;
use Text::ParseWords;
while (<DATA>) {
my @fields = parse_line(',', 0, $_);
# Do something useful with the data in @fields
print join ' | ', @fields;
}
__DATA__
key,value1,"value2,hello"
Upvotes: 3
Reputation: 67968
(.*?)\,.*?\"(.*?)\,.*
You can try this.
See demo.
http://regex101.com/r/rI6jZ0/2
Upvotes: 0
Reputation: 656
If you want to use regular expressions for this task, the following should work:
(\S+,)\d+,\"(\d+),\S+\"
(\S+,)
is a first capturing group that selects the first key, including the comma. This is followed by some numbers, a comma and a quote \d+,\"
. The second capturing group (\d+)
selects the second value, which is followed by a comma, a string and a quote: ,\D+\"
But as others already wrote, there other solutions that don't involve regular expressions.
Upvotes: 0
Reputation: 172270
This is valid CSV syntax, so you can just use a CSV parser.
You did not specify which language you are using, but most have a CSV parser readily available in the class library (for example, TextFieldParser in .NET) or as an external component (for example, CSVParser in Apache Commons for Java).
If you do want to re-invent the wheel (which I do not recommend), the algorithm is quite simple:
result = "", inQuotes = false
read next character
if end-of-line:
if inQuotes:
throw error (unmatched quotes)
yield result
return
else if character = '"':
invert inQuotes
else if character = ',' and not inQuotes:
yield result
result = ""
else:
result += character
Upvotes: 2