Unix: Count occurrences of similar entries in first column, sum the second column

Question

I have a file with two columns of data, I would like to count the occurrence of similarities in the first column. When two similar entries in the first column are matched, I would like to also sum the value of the second column of the two matched entries.

Example list:

2013-11-13-03 1
2013-11-13-06 1
2013-11-13-13 2
2013-11-13-13 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-17 1
2013-11-13-23 1
2013-11-14-01 1
2013-11-14-04 6
2013-11-14-07 1
2013-11-14-08 1
2013-11-14-09 1
2013-11-14-09 1

I would like the output to read similar to the following

2013-11-13-03 1 1
2013-11-13-06 1 1
2013-11-13-13 2 3
2013-11-13-15 3 3
2013-11-13-17 1 1
2013-11-13-23 1 1
2013-11-14-01 1 1
2013-11-14-04 1 6
2013-11-14-07 1 1
2013-11-14-08 1 1
2013-11-14-09 2 2

Column 1 is the matched columns from the earlier example column 1, column 2 is the count of matches of column 1 from the earlier example (1 if no other matches), column 3 is the sum of column 2 from the matched column 1 entries from the earlier example. Anyone have any tips on completing this using awk or a mixture of uniq and awk?

danfuzz · Accepted Answer

Here's a quickie with awk and sort:

awk '
{
    counts[$1]++;     # Increment count of lines.
    totals[$1] += $2; # Accumulate sum of second column.
}
END {
    # Iterate over all first-column values.
    for (x in counts) {
        print x, counts[x], totals[x];
    }
}
' file.txt | sort

You can skip the sort if you don't care about the order of output lines.

Unix: Count occurrences of similar entries in first column, sum the second column

Answers (2)

Related Questions