Upment
Upment

Reputation: 1671

regex to remove comma between double quotes notepad++

I am trying to remove commas inside double quotes from a csv file in notepad++, this is what I have:

1070,17,2,GN3-670,"COLLAR B, M STAY","2,606.45"

and I need this:

1070,17,2,GN3-670,"COLLAR B M STAY","2606.45"

I ma trying to use notepad find/replace option with a reg exp. pattern. I tried all kind of combination but didn't manage to do :( The file contains 1 million rows.

After whole today I am not anymore sure if a simple regex can do? Maybe I should go with a script...python?

Upvotes: 16

Views: 41134

Answers (4)

Yiran Guo
Yiran Guo

Reputation: 19

For a line with multiple instances of "comma within double quotes", I can think of the following perl script - you need to have a header line without this kind of instance so that you know how many comma-separated fields there should be.

#! /usr/bin/perl -w

use strict;

my $n_fields = "";
while (<>) {
    s/\s+$//;
    if (/^\#/) { # header line
        my @t = split(/,/);
        $n_fields = scalar(@t); # total number of fields
    } else { # actual data
        my $n_commas = $_ =~s/,/,/g; # total number of commas
        foreach my $i (0 .. $n_commas - $n_fields) { # iterate ($n_commas - $n_fields + 1) times
            s/(\"[^",]+),([^"]+\")/$1\\x2c$2/g; # single replacement per previous answers
        }
        s/\"//g; # removal of double quotes (if you want)
    }
    print "$_\n";
}

Upvotes: 0

Nithin
Nithin

Reputation: 171

Try the following

import re

print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)

This will remove comma between quotes

Upvotes: 15

Anand Sunderraman
Anand Sunderraman

Reputation: 8148

Just an update to @zx81's brilliant solution. Lets say you have 2commas between quotes

Then the regex search has to be modified as follows:

("[^",]+),([^",]+),([^"]+")

Replace needs to be modified as

$1$2$3

So on modify it depending on the # of commas.

I tried exploring to see if recursive regex was possible but the does not seem to be possible as of now

Upvotes: 6

zx81
zx81

Reputation: 41848

mrki, this will do what you want (tested in N++):

Search: ("[^",]+),([^"]+")

Replace: $1$2 or \1\2

How does this work? The first parentheses capture the beginning of the string up to (but not including) the comma into Group 1. The second parentheses capture the end of the string after the comma into Group 2. The replacement substitutes the string with a concatenation of Group 1 and Group 2.

In more detail: in the first parentheses, we match the opening double quotes then any number of characters that are not a comma. That is the meaning of [^,]+. In the second parentheses, we match any number of characters that are not a double quote with [^"]+, then the closing double quotes .

Upvotes: 37

Related Questions