Reputation: 25
Is there a simple way to remove a comma within a csv that isn't suppose to be there using AWK/ SED?
The .csv is made up of 5 columns, but that additional comma makes some rows come up with 6 columns. For example see below, the third column i am expecting to be number, but as you can see row 3 and 4 are strings.
4,abc,323,123,acvd
4,abc,323,123,acvd
4,ab,c,323,123,acvd
4,db,c,323,123,acvd
I didn't want to over engineer the solution which i could PROBABLY use pandas / python.
It is also a very large file 5GB file, million + rows so isn't something i can open in excel.
Desired output would be to add the addition string or remove it i.e
4,abc,323,123,acvd
4,abc,323,123,acvd
4,abc,323,123,acvd
4,dbc,323,123,acvd
4,abc,323,123,acvd
4,abc,323,123,acvd
4,ab,323,123,acvd
4,db,323,123,acvd
Upvotes: 1
Views: 729
Reputation: 203209
Since removing the extra field is an acceptable option for you:
$ awk 'BEGIN{FS=OFS=","} {print $1, $2, $(NF-2), $(NF-1), $NF}' file
4,abc,323,123,acvd
4,abc,323,123,acvd
4,ab,323,123,acvd
4,db,323,123,acvd
otherwise:
$ awk 'BEGIN{FS=OFS=","} {print $1, $2 (NF>5 ? $3 : ""), $(NF-2), $(NF-1), $NF}' file
4,abc,323,123,acvd
4,abc,323,123,acvd
4,abc,323,123,acvd
4,dbc,323,123,acvd
Upvotes: 5
Reputation: 784958
This simple awk
script would also do your job:
awk 'BEGIN {FS=OFS=","} NF==6 {$2 = $2 $3; $3=""; sub(/,,/, ",")} 1' file.csv
4,abc,323,123,acvd
4,abc,323,123,acvd
4,abc,323,123,acvd
4,dbc,323,123,acvd
Upvotes: 3
Reputation: 12347
Use this Perl one-liner:
perl -F',' -lane 'if ( @F > 5 ) { $_ = join ",", $F[0], "$F[1]$F[2]", @F[3..$#F]; } print;' in_file
Output:
4,abc,323,123,acvd
4,abc,323,123,acvd
4,abc,323,123,acvd
4,dbc,323,123,acvd
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
-a
: Split $_
into array @F
on whitespace or on the regex specified in -F
option.
-F','
: Split into @F
on comma, rather than on whitespace.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
Upvotes: 1