Reputation: 1671
I am trying to remove commas inside double quotes from a csv file in notepad++, this is what I have:
1070,17,2,GN3-670,"COLLAR B, M STAY","2,606.45"
and I need this:
1070,17,2,GN3-670,"COLLAR B M STAY","2606.45"
I ma trying to use notepad find/replace option with a reg exp. pattern. I tried all kind of combination but didn't manage to do :( The file contains 1 million rows.
After whole today I am not anymore sure if a simple regex can do? Maybe I should go with a script...python?
Upvotes: 16
Views: 41134
Reputation: 19
For a line with multiple instances of "comma within double quotes", I can think of the following perl script - you need to have a header line without this kind of instance so that you know how many comma-separated fields there should be.
#! /usr/bin/perl -w
use strict;
my $n_fields = "";
while (<>) {
s/\s+$//;
if (/^\#/) { # header line
my @t = split(/,/);
$n_fields = scalar(@t); # total number of fields
} else { # actual data
my $n_commas = $_ =~s/,/,/g; # total number of commas
foreach my $i (0 .. $n_commas - $n_fields) { # iterate ($n_commas - $n_fields + 1) times
s/(\"[^",]+),([^"]+\")/$1\\x2c$2/g; # single replacement per previous answers
}
s/\"//g; # removal of double quotes (if you want)
}
print "$_\n";
}
Upvotes: 0
Reputation: 171
Try the following
import re
print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)
This will remove comma between quotes
Upvotes: 15
Reputation: 8148
Just an update to @zx81's brilliant solution. Lets say you have 2commas between quotes
Then the regex search has to be modified as follows:
("[^",]+),([^",]+),([^"]+")
Replace needs to be modified as
$1$2$3
So on modify it depending on the # of commas.
I tried exploring to see if recursive regex was possible but the does not seem to be possible as of now
Upvotes: 6
Reputation: 41848
mrki, this will do what you want (tested in N++):
Search: ("[^",]+),([^"]+")
Replace: $1$2
or \1\2
How does this work? The first parentheses capture the beginning of the string up to (but not including) the comma into Group 1. The second parentheses capture the end of the string after the comma into Group 2. The replacement substitutes the string with a concatenation of Group 1 and Group 2.
In more detail: in the first parentheses, we match the opening double quotes then any number of characters that are not a comma. That is the meaning of [^,]+
. In the second parentheses, we match any number of characters that are not a double quote with [^"]+
, then the closing double quotes .
Upvotes: 37