How to sum up all other columns based on column 1?

Question

I have an example csv file like below (but with way more columns numbering up to Sample 100 and several rows)

Genus,Sample1,Sample2,Sample3
Unclassified,0,1,44
Unclassified,0,0,392
Unclassified,0,0,0
Woeseia,0,0,76

and I would like to have a summed csv file as below where all the identical entries on column 1 are summed up

Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76

I tried the following awk code but it didn't work

awk  -F "," 'function SP()  {n=split ($0, T); ID=$1}
         function PR()  {printf "%s", ID; for (i=2; i<=n; i++) printf "	%s", T[i]; printf "
"}

         NR==1          {SP();next}
         $1 != ID       {PR(); SP(); next}
                        {for (i=2; i<=NF; i++) T[i]+=$i}
         END            {PR()}
        ' Filename.csv

I am also aware of doing something like below but it is impractical when there are hundreds of columns. Any help here would be appreciated.

awk -F "," ' NR==1 {print; next} NF {a[$1]+=$2; b[$1]+=$3; c[$1]+=$4; d[$1]+=$5; e[$1]+=$6; f[$1]++} END {for(i in a)print i, a[i], b[i], c[i], d[i], e[i], f[i]} ' Filename.csv

RavinderSingh13 · Accepted Answer

With your shown samples, please try following awk program. You need NOT to create these many arrays, you could easily do it with 1 or 2 here.

awk '
BEGIN { FS=OFS="," }
FNR==1{
  print
  next
}
{
  for(i=2;i<=NF;i++){
    arr1[$1]
    arr2[$1,i]+=$i
  }
}
END{
  for(i in arr1){
    printf("%s,",i)
    for(j=2;j<=NF;j++){
      printf("%s%s",arr2[i,j],j==NF?ORS:OFS)
    }
  }
}
'  Input_file

Output will be as follows:

Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76

Explanation: Adding detailed explanation for above code.

awk '                         ##Starting awk program from here.
BEGIN { FS=OFS="," }          ##In BEGIN section setting FS and OFS as comma here.
FNR==1{                       ##Checking if this is first line then do following.
  print                       ##Printing current line.
  next                        ##next will skip further statements from here.
}
{
  for(i=2;i<=NF;i++){         ##Running for loop from 2nd field to till NF here.
    arr1[$1]                  ##Creating arr1 array with index of 1st field.
    arr2[$1,i]+=$i            ##Creating arr2 with index of 1st field and current field number and value is current field value which is keep adding into it.
  }
}
END{                          ##Starting END block for this program from here.
  for(i in arr1){             ##Traversing through arr1 all elements here one by one.
    printf("%s,",i)           ##Printing its current index here.
    for(j=2;j<=NF;j++){       ##Running for loop from 2nd field to till NF here.
      printf("%s%s",arr2[i,j],j==NF?ORS:OFS) ##Printing value of arr2 with index of i and j, printing new line if its last field.
    }
  }
}
'  Input_file                 ##Mentioning Input_file here.

How to sum up all other columns based on column 1?

Answers (2)

Related Questions