Reputation: 139
I am looking for a simple solution to have for each line the same number of commas in file (CSV file)
e.g.
example of file:
1,1
A,B,C,D,E,F
2,2,
3,3,3,
4,4,4,4
expected:
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,
the line with the largest number of commas has 5 commas in this case (line #2). so, I want to add other commas in all lines to have the same number for each line (i.e. 5 commas)
Upvotes: 0
Views: 67
Reputation: 133650
Could you please try following, a more generic way. This code will work even number of fields are not same in your Input_file and will first read and get maximum number of fields from whole file and then 2nd time reading file it will reset the fields(why because we have set OFS as , so if current line's number of fields are lesser than nf value those many commas will be added to that line). Enhanced version of @oguz ismail's answer.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
nf=nf>NF?nf:NF
next
}
{
$nf=$nf
}
1
' Input_file Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program frmo here.
BEGIN{ ##Starting BEGIN section of awk program from here.
FS=OFS="," ##Setting FS and OFS as comma for all lines here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
nf=nf>NF?nf:NF ##Creating variable nf whose value is getting set as per condition, if nf is greater than NF then set it as NF else keep it as it is,
next ##next will skip all further statements from here.
}
{
$nf=$nf ##Mentioning $nf=$nf will reset current lines value and will add comma(s) at last of line if NF is lesser than nf.
}
1 ##1 will print edited/non-edited lines here.
' Input_file Input_file ##Mentioning Input_file names here.
Upvotes: 2
Reputation: 84579
Another take on providing making all lines in a CSV file have the same number of fields. The number of fields need not be known. The max
fields will be calculated and a substring of needed commas appended to each record, e.g.
awk -F, -v max=0 '{
lines[n++] = $0 # store lines indexed by line number
fields[lines[n-1]] = NF # store number of field indexed by $0
if (NF > max) # find max NF value
max = NF
}
END {
for(i=0;i<max;i++) # form string with max commas
commastr=commastr","
for(i=0;i<n;i++) # loop appended substring of commas
printf "%s%s\n", lines[i], substr(commastr,1,max-fields[lines[i]])
}' file
Example Use/Output
Pasting at the command-line, you would receive:
$ awk -F, -v max=0 '{
> lines[n++] = $0 # store lines indexed by line number
> fields[lines[n-1]] = NF # store number of field indexed by $0
> if (NF > max) # find max NF value
> max = NF
> }
> END {
> for(i=0;i<max;i++) # form string with max commas
> commastr=commastr","
> for(i=0;i<n;i++) # loop appended substring of commas
> printf "%s%s\n", lines[i], substr(commastr,1,max-fields[lines[i]])
> }' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,
Upvotes: 2
Reputation: 50795
Using awk:
$ awk 'BEGIN{FS=OFS=","} {$6=$6} 1' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,
As you can see above, in this approach the max. number of fields must be hardcoded in the command.
Upvotes: 2