user1298426
user1298426

Reputation: 3717

awk to do group by sum of column

I have this csv file and I am trying to write shell script to calculate sum of column after doing group by on it. Column number is 11th (STATUS)

My script is

awk -F, 'NR>1{arr[$11]++}END{for (a in arr) print a, arr[a]}' $f > $parentdir/outputfile.csv;

File output expected is

COMMITTED 2

but actual output is just 2.

It prints only count and not group by sum. If I delete any other columns and run same query then it works fine but not with below sample data.

FILE NAME;SEQUENCE NR;TRANSACTION ID;RUN NUMBER;START EDITCREATION;END EDITCREATION;END COMMIT;EDIT DURATION;COMMIT DURATION;HAS DEPENDENCY;STATUS;DETAILS
Buldhana_Refinesource_FG_IW_ETS_000001.xml;1;4a032127-b20d-4fa8-9f4d-7f2999c0c08f;1;20180831130210345;20180831130429638;20180831130722406;140;173;false;COMMITTED;
Buldhana_Refinesource_FG_IW_ETS_000001.xml;2;e4043fc0-3b0a-46ec-b409-748f98ce98ad;1;20180831130722724;20180831130947144;20180831131216693;145;150;false;COMMITTED;

Upvotes: 0

Views: 1736

Answers (2)

justaguy
justaguy

Reputation: 3022

change the FS to ; in your script

awk -F';' 'NR>1{arr[$11]++}END{for (a in arr) print a, arr[a]}' file

COMMITTED 2

Upvotes: 2

dibery
dibery

Reputation: 3460

You're using wrong field separator. Use

awk -F\;

; must be escaped to use it as a literal. Except this, your approach seems OK.


Besides awk, you may also use

tail -n +2 $f | cut -f11 -d\; | sort | uniq -c

or

datamash --header-in -t \; -g 11 count 11 < $f

to do the same thing.

Upvotes: 1

Related Questions