Mert Nuhoglu
Mert Nuhoglu

Reputation: 10133

Adding commas when necessary to a csv file using regex

I have a csv file like the following:

entity_name,data_field_name,type
Unit,id
Track,id,LONG

The second row is missing a comma. I wonder if there might be some regex or awk like tool in order to append commas to the end of line in case there are missing commas in these rows?

Update

I know the requirements are a little vague. There might be several alternative ways to narrow down the requirements such as:

  1. The header row should define the number of columns (and commas) that is valid for the whole file. The script should read the header row first and find out the correct number of columns.
  2. The number of columns might be passed as an argument to the script.
  3. The number of columns can be hardcoded into the script.

I didn't narrow down the requirements at first because I was ok with any of them. Of course, the first alternative is the best but I wasn't sure if this was easy to implement or not.

Thanks for all the great answers and comments. Next time, I will state acceptable alternative requirements explicitly.

Upvotes: 2

Views: 314

Answers (6)

anubhava
anubhava

Reputation: 784998

You can use this awk command to fill up all rows starting from 2nd row with the empty cell values based on # of columns in the header row, in order to avoid hard-coding # of columns:

awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} NF{$nc=$nc} 1' file

entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Earlier solution:

awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} {printf "%s", $0;
  for (i=NF+1; i<=nc; i++) printf "%s", OFS; print ""}' file

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203229

This MIGHT be all you need, depending on the info you haven't shared with us in your question:

$ awk -F, '{print $0 (NF<3?FS:"")}' file
entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Upvotes: 1

Lieven Keersmaekers
Lieven Keersmaekers

Reputation: 58431

to present balance to all the awk solutions, following could be a vim only solution

:v/,.*,/norm A,

rationale

/,.*,/          searches for 2 comma's in a line
:v              apply a global command on each line NOT matching the search
norm A,         enters normal mode and appends a , to the end of the line        

Upvotes: 1

sat
sat

Reputation: 14949

With another awk:

awk -F, 'NF==2{$3=""}1' OFS=, yourfile.csv

Upvotes: 1

Ren
Ren

Reputation: 2946

Try this:

$ awk -F , 'NF==2{$2=$2","}1' file

Output:

entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

I would use sed,

sed 's/^[^,]*,[^,]*$/&,/' file

Example:

$ echo 'Unit,id' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,
$ echo 'Unit,id,bar' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,bar

Upvotes: 1

Related Questions