Reputation: 55
I have a csv file of format:
value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4
I want to replace the commas within the outermost quotes of third field with ';' and remove the inner quotes. I have tried using "sed" but nothing has helped to replace the nested quotes.
Upvotes: 0
Views: 1357
Reputation:
Could do it like this.
The criteria is even number of quotes within quoted field that is surrounded
by comma's as a field separator.
Note that if the csv does not abide by the above criteria, nothing will save it,
it can never be parsed.
(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))
Formatted:
(?: ^ | , )
\s*
\K
"
( # (1 start)
[^"]*
(?: # Inner, even number of quotes
"
[^"]*
"
[^"]*
)+
) # (1 end)
"
(?=
\s*
(?: , | $ )
)
Perl sample:
use strict;
use warnings;
my $data = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';
sub innerRepl
{
my ($in) = @_;
return '"' . ($in =~ tr/,"/;/dr ) . '"';
}
$data =~ s/(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))/ innerRepl( $1 ) /eg;
print $data;
Output:
value1, value2, "some text in the; quotes; with commas and nested quotes; some more text", value3, value4
Upvotes: 2
Reputation: 126772
You need a recursive regex to match nested quotes, and the tidiest way to alter the quotes and commas is an expression substitution in concert with a non-destructive transliteration which became available in v5.14 of Perl
Like this
use strict;
use warnings 'all';
use v5.14;
my $str = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';
$str =~ s{ " ( (?: [^"]++ | (?R) )* ) " }{ $1 =~ tr/,"/;/dr }egx;
print $str, "\n";
value1, value2, some text in the; quotes; with commas and nested quotes; some more text, value3, value4
Upvotes: 3