MrSilverSnorkel
MrSilverSnorkel

Reputation: 509

Remove comma as thousands separator from quoted numbers in a CSV using sed

My sed is pretty shaky, so I'm not sure how to take a row like this

1,2,"12,345",x,y,"a,b"

and turn it into

1,2,12345,x,y,"a,b"

So the number "12,345" becomes 12345, but "a,b" remains untouched.

I would need to somehow preserve the values around the comma when the values are numeric. I have an idea how the regex would look like to only deal with digits, but not really sure how to just remove the comma, as opposed to removing the whole column.

Upvotes: 4

Views: 3106

Answers (5)

alpha bravo
alpha bravo

Reputation: 7948

use this pattern (\d),(\d)(?!(([^"]*"){2})*[^"]*$) and replace w/ $1$2
Demo

Upvotes: 0

Firas Dib
Firas Dib

Reputation: 2621

In one regex substitution you could do something as nasty as this: /\G(?|(")(\d+)(?:,(\d+))*(")|()([^,]+)()())(,|$)/g replace with \1\2\3\4\5

This should work fine with Perl.

demo: http://regex101.com/r/kQ5fU1

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77085

Parsing CSV should be done with a proper csv parser. I would recommend perl as well.

perl -MText::ParseWords -ne '
    @line = parse_line(",", 1, $_); 
    print join "," , map { s/,//g if $_ =~ /^[0-9,"]+$/; $_ } @line
' text.csv

Test:

$ cat text.csv
1,2,"12,345",x,y,"a,b"
"a,c","12,345",x,y,"a,b"

$ perl -MText::ParseWords -ne '
    @line = parse_line(",", 1, $_);
    print join "," , map { s/,//g if $_ =~ /^[0-9,"]+$/; $_ } @line
' text.csv
1,2,"12345",x,y,"a,b"
"a,c","12345",x,y,"a,b"

To make in-place changes you can use -i option or re-direct the output to another file.

Upvotes: 2

Nicolas De Jay
Nicolas De Jay

Reputation: 444

You can use:

echo '1,2,"12,345",x,y,"a,b"' | sed 's/"\([0-9]*\),\([0-9]*\)"/\1\2/g'

EDIT: Actually, my solution only works if there is one comma enclosed between double quotes.

Upvotes: 0

choroba
choroba

Reputation: 241768

Perl solution, using Text::CSV:

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV;

my @rows;

my $csv = 'Text::CSV'->new({ binary => 1}) or die 'Text::CVS'->error_diag;
open my $IN, '<', 'file.csv' or die $!;
while (my $row = $csv->getline($IN)) {
    for my $cell (@$row) {
        $cell =~ s/,// if $cell =~ /^[0-9,]+$/;
    }
    push @rows, $row;
}
$csv->eof or $csv->error_diag;

open my $OUT, '>', 'new.csv' or die $!;
$csv->print($OUT, $_) for @rows;
close $OUT or die $!;

Upvotes: 1

Related Questions