owwoow14
owwoow14

Reputation: 1754

Sum frequency of one column using command line

I need to summarize the frequency of one column in a large-tab=separated file.

An example is:

abbot   achievement 1
abbot   acknowledge 2
abbot   acknowledge 2
abbot   acknowledge 3
abbot   acquaintance    1
abbot   acquire 2
abbot   acquisition 2
abbot   acre    1
abbot   acre    4
abbot   act 1
abbot   act 4
abbot   act 3
abbot   act 8
abbot   act 5
abbot   act 7
abbot   act 8
abbot   action  2
abbot   active  4

I want to sum the frequency of those Columns 1 & 2 that are identical for the final result:

abbot   achievement 1
abbot   acknowledge 7
abbot   acquaintance    1
abbot   acquire 2
abbot   acquisition 2
abbot   acre    5
abbot   act 36
abbot   action  2
abbot   active  4

I have asked a similar question here: and used the following command: $ sort input.txt | uniq -c | awk ' { print $2 "\t" $3 "\t" $1*$4 } '`

but this does not solve the problem because for instance the sort function will only sum up all three columns that are identical producing a result that adds a new column one with the summed frequency from all three columns.

Can anyone suggest a modification to this command that will produce my desired result? Or perhaps suggest a better path to solve this problem?

Upvotes: 0

Views: 263

Answers (1)

Jotne
Jotne

Reputation: 41456

Using awk and sum in array

awk '{ a[$1 FS $2]+=$3 } END {for (i in a) print i,a[i] }' file
abbot active 4
abbot action 2
abbot achievement 1
abbot acre 5
abbot acquire 2
abbot acknowledge 7
abbot acquisition 2
abbot act 36
abbot acquaintance 1

Upvotes: 1

Related Questions