Unix, group rows and sum values of columns from file.csv

Question

I have this file.csv

"201707"|"51976551"|1|0|1|"20170702"
"201707"|"51955194"|1|0|0|"20170702"
"201707"|"51923555"|1|0|1|"20170702"
"201707"|"51976551"|1|0|1|"20170703"
"201707"|"51955194"|1|0|0|"20170703"
"201707"|"51923555"|1|0|1|"20170703"
"201707"|"51960597"|1|0|0|"20170703"

And my hope result is group by the number and sum the column 3, 4 and 5

"201707"|"51976551"|2|0|2
"201707"|"51955194"|2|0|0
"201707"|"51923555"|2|0|2
"201707"|"51960597"|1|0|0

I've tried with:

cat file.csv | awk -F"|" '
  { a[$2] += $3 }
  END {
    for (i in a) {
      printf "%s|%s
", i, a[i];
    }
  }
'

And the result is:

"51976551"|2
"51955194"|2
"51923555"|2
"51960597"|1

Only shows the sum of third column, but I need 2 columns more. what should I do in this case?

John1024 · Accepted Answer

Try:

$ awk -F"|" '{ a[$1 OFS $2]+=$3; b[$1 OFS $2]+=$4; c[$1 OFS $2]+=$5 }
  END {
    for (i in a) {
      print i, a[i], b[i], c[i];
    }
  }
' OFS=\| file.csv
"201707"|"51976551"|2|0|2
"201707"|"51960597"|1|0|0
"201707"|"51923555"|2|0|2
"201707"|"51955194"|2|0|0

How it works

-F"|"

This sets the field separator on input to |.
a[$1 OFS $2]+=$3; b[$1 OFS $2]+=$4; c[$1 OFS $2]+=$5

This keeps track of the totals of the third, fourth, and fifth columns.
END { for (i in a) { print i, a[i], b[i], c[i]; } }

This prints out the results.
OFS=\|

This tells awk to use | as the field separator on output.

Unix, group rows and sum values of columns from file.csv

Answers (2)

How it works

Related Questions