Joseph Ivan Hayhoe
Joseph Ivan Hayhoe

Reputation: 113

Unix: Count occurrences of similar entries in first column, sum the second column

I have a file with two columns of data, I would like to count the occurrence of similarities in the first column. When two similar entries in the first column are matched, I would like to also sum the value of the second column of the two matched entries.

Example list:

2013-11-13-03 1
2013-11-13-06 1
2013-11-13-13 2
2013-11-13-13 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-17 1
2013-11-13-23 1
2013-11-14-01 1
2013-11-14-04 6
2013-11-14-07 1
2013-11-14-08 1
2013-11-14-09 1
2013-11-14-09 1

I would like the output to read similar to the following

2013-11-13-03 1 1
2013-11-13-06 1 1
2013-11-13-13 2 3
2013-11-13-15 3 3
2013-11-13-17 1 1
2013-11-13-23 1 1
2013-11-14-01 1 1
2013-11-14-04 1 6
2013-11-14-07 1 1
2013-11-14-08 1 1
2013-11-14-09 2 2

Column 1 is the matched columns from the earlier example column 1, column 2 is the count of matches of column 1 from the earlier example (1 if no other matches), column 3 is the sum of column 2 from the matched column 1 entries from the earlier example. Anyone have any tips on completing this using awk or a mixture of uniq and awk?

Upvotes: 5

Views: 1517

Answers (2)

damienfrancois
damienfrancois

Reputation: 59180

Here a pure Bash solution

$ cat t
2013-11-13-03 1
2013-11-13-06 1
2013-11-13-13 2
2013-11-13-13 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-17 1
2013-11-13-23 1
2013-11-14-01 1
2013-11-14-04 6
2013-11-14-07 1
2013-11-14-08 1
2013-11-14-09 1
2013-11-14-09 1
$ declare -A SUM CNT
$ while read ts vl; do (( SUM[$ts]=+$vl )) ; (( CNT[$ts]++ )); done < t
$ for i in "${!CNT[@]}"; do   echo "$i ${CNT[$i]} ${SUM[$i]} "; done | sort
2013-11-13-03 1 1 
2013-11-13-06 1 1 
2013-11-13-13 2 3 
2013-11-13-15 3 3 
2013-11-13-17 1 1 
2013-11-13-23 1 1 
2013-11-14-01 1 1 
2013-11-14-04 1 6 
2013-11-14-07 1 1 
2013-11-14-08 1 1 
2013-11-14-09 2 2 

Upvotes: 1

danfuzz
danfuzz

Reputation: 4353

Here's a quickie with awk and sort:

awk '
{
    counts[$1]++;     # Increment count of lines.
    totals[$1] += $2; # Accumulate sum of second column.
}
END {
    # Iterate over all first-column values.
    for (x in counts) {
        print x, counts[x], totals[x];
    }
}
' file.txt | sort

You can skip the sort if you don't care about the order of output lines.

Upvotes: 5

Related Questions