Duncan
Duncan

Reputation: 43

Perl Regex To Remove Commas Between Quotes?

I am trying to remove commas between double quotes in a string, while leaving other commas intact? (This is an email address which sometimes contains spare commas). The following "brute force" code works OK on my particular machine, but is there a more elegant way to do it, perhaps with a single regex? Duncan

$string = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <[email protected]>,1,2';
print "Initial string = ", $string, "<br>\n";

# Extract stuff between the quotes
$string =~ /\"(.*?)\"/;

$name = $1;
print "name = ", $1, "<br>\n";
# Delete all commas between the quotes
$name =~ s/,//g;
print "name minus commas = ", $name, "<br>\n";
# Put the modified name back between the quotes
$string =~ s/\"(.*?)\"/\"$name\"/;
print "new string = ", $string, "<br>\n";

Upvotes: 2

Views: 2095

Answers (2)

TLP
TLP

Reputation: 67900

One way would be to use the nice module Text::ParseWords to isolate the specific field and perform a simple transliteration to get rid of the commas:

use strict;
use warnings;
use Text::ParseWords;

my $str = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <[email protected]>,1,2';
my @row = quotewords(',', 1, $str);
$row[2] =~ tr/,//d;
print join ",", @row;

Output:

06/14/2015,19:13:51,"Mrs Nkolika Nebedom" <[email protected]>,1,2

I assume that no commas can appear legitimately in your email field. Otherwise some other replacement method is required.

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use this kind of pattern:

$string =~ s/(?:\G(?!\A)|[^"]*")[^",]*\K(?:,|"(*SKIP)(*FAIL))//g;

pattern details:

(?: # two possible beginnings:
    \G(?!\A) # contiguous to the previous match
  |          # OR
    [^"]*"   # all characters until an opening quote
)
[^",]*     #"# all that is not a quote or a comma
\K           # discard all previous characters from the match result
(?:          # two possible cases:
    ,        # a comma is found, so it will be replaced
  |          # OR
    "(*SKIP)(*FAIL) #"# when the closing quote is reached, make the pattern fail
                      # and force the regex engine to not retry previous positions.
)

If you use an older perl version, \K and the backtracking control verbs may be not supported. In this case you can use this pattern with capture groups:

$string =~ s/((?:\G(?!\A)|[^"]*")[^",]*)(?:,|("[^"]*(?:"|\z)))/$1$2/g;

Upvotes: 3

Related Questions