Remove extra commas from only 2nd and 3rd row of CSV file

Question

I have a comma delimited file (CSV file) test.csv as shown below.

FHEAD,1,2,3,,,,,,
FDEP,2,3,,,,,,,,
FCLS,3,,,4-5,,,,,,,
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,

I wanted to remove the empty columns only from 2nd and 3rd row of the file i.e. were ever the records starts with FDEP and FCLS only in those rows I wanted to remove the empty columns (,,).

after removing the empty columns the same file test.csv should look like

FHEAD,1,2,3,,,,,,
FDEP,2,3
FCLS,3,4-5
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,

How can I do this in Unix???

melpomene · Accepted Answer

Here's one way to do it, using sed:

sed '/^F$DEP\|CLS$,/ { s/,\{2,\}/,/g; s/,$// }'

We use a range of /^F$DEP\|CLS$,/, i.e. the following command will only process lines matching ^F$DEP\|CLS$,. This regex matches beginning-of-string, followed by F, followed by either DEP or CLS, followed by ,. In other words, we look for lines starting with FDEP, or FCLS,.

Having found such a line, we first substitute (s command) all runs (g flag, match as many times as possible) of 2 or more (\{2,\}) commas (,) in a row by a single ,. This squeezes ,,, down to a single ,.

Second, we substitute , at end-of-string by nothing. This gets rid of any trailing comma.

Remove extra commas from only 2nd and 3rd row of CSV file

Answers (1)

Related Questions