Maurice Amar
Maurice Amar

Reputation: 139

Append delimiters for implied blank fields

I am looking for a simple solution to have for each line the same number of commas in file (CSV file)

e.g.

example of file:

1,1
A,B,C,D,E,F
2,2,
3,3,3,
4,4,4,4

expected:

1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,

the line with the largest number of commas has 5 commas in this case (line #2). so, I want to add other commas in all lines to have the same number for each line (i.e. 5 commas)

Upvotes: 0

Views: 67

Answers (3)

RavinderSingh13
RavinderSingh13

Reputation: 133650

Could you please try following, a more generic way. This code will work even number of fields are not same in your Input_file and will first read and get maximum number of fields from whole file and then 2nd time reading file it will reset the fields(why because we have set OFS as , so if current line's number of fields are lesser than nf value those many commas will be added to that line). Enhanced version of @oguz ismail's answer.

awk '
BEGIN{
 FS=OFS=","
}
FNR==NR{
 nf=nf>NF?nf:NF
 next
}
{
 $nf=$nf
}
1
'  Input_file  Input_file

Explanation: Adding detailed explanation for above code.

awk '                ##Starting awk program frmo here.
BEGIN{               ##Starting BEGIN section of awk program from here.
 FS=OFS=","          ##Setting FS and OFS as comma for all lines here.
}
FNR==NR{             ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
 nf=nf>NF?nf:NF      ##Creating variable nf whose value is getting set as per condition, if nf is greater than NF then set it as NF else keep it as it is,
 next                ##next will skip all further statements from here.
}
{
 $nf=$nf             ##Mentioning $nf=$nf will reset current lines value and will add comma(s) at last of line if NF is lesser than nf.
}
1                    ##1 will print edited/non-edited lines here.
' Input_file Input_file      ##Mentioning Input_file names here.

Upvotes: 2

David C. Rankin
David C. Rankin

Reputation: 84579

Another take on providing making all lines in a CSV file have the same number of fields. The number of fields need not be known. The max fields will be calculated and a substring of needed commas appended to each record, e.g.

awk -F, -v max=0 '{
    lines[n++] = $0             # store lines indexed by line number
    fields[lines[n-1]] = NF     # store number of field indexed by $0
    if (NF > max)               # find max NF value
        max = NF
}
END {
    for(i=0;i<max;i++)          # form string with max commas
        commastr=commastr","
    for(i=0;i<n;i++)            # loop appended substring of commas 
        printf "%s%s\n", lines[i], substr(commastr,1,max-fields[lines[i]])
}' file

Example Use/Output

Pasting at the command-line, you would receive:

$ awk -F, -v max=0 '{
>     lines[n++] = $0             # store lines indexed by line number
>     fields[lines[n-1]] = NF     # store number of field indexed by $0
>     if (NF > max)               # find max NF value
>         max = NF
> }
> END {
>     for(i=0;i<max;i++)          # form string with max commas
>         commastr=commastr","
>     for(i=0;i<n;i++)            # loop appended substring of commas
>         printf "%s%s\n", lines[i], substr(commastr,1,max-fields[lines[i]])
> }' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,

Upvotes: 2

oguz ismail
oguz ismail

Reputation: 50795

Using awk:

$ awk 'BEGIN{FS=OFS=","} {$6=$6} 1' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,

As you can see above, in this approach the max. number of fields must be hardcoded in the command.

Upvotes: 2

Related Questions