Counting unique occurrences in each column

Question

I have a file with several columns like $2$3 (until $32) as in

A refdevhet devdevhomo
B refdevhet refdevhet
C refrefhomo refdevhet
D devrefhet  refdevhet

I need to count how many occurrences of each unique element in each column separately

so that I have

refdevhet  2 3
refrefhomo 1 0
devrefhet  1 0
devdevhomo 0 1

I tried several variations of

awk 'BEGIN {
  FS=OFS="	"
}

{
  for(i=1; i<=32; i++) a[$i]++
}

END {
  for (i in a) print i, a[i]
}' file

but instead it's printing the cumulative sum of occurrences of unique elements across the selected fields.

Andriy Makukha · Accepted Answer

Here is a solution:

BEGIN {
    FS=OFS="	"
}
{
    if (NF>mxf) mxf = NF;
    for(i=1; i<=NF; i++) {ws[$i]=1; c[$i,i]++}
} 
END {
    for (w in ws) {
        printf "%s", w
        for (i=1;i<=mxf;i++) printf "%s%d", OFS, c[w,i];
        print ""
    }
}

Notice that solution is general. It will include first column into consideration as well. To omit the first column, change i=1 to i=2 in both places.

Counting unique occurrences in each column

Answers (2)

Related Questions