pdg
pdg

Reputation: 27

How to loop over a file having multiple columns to execute an AWK script?

I have a file with multiple columns (greater than 1000). Each column has numbers 0, 1 or some other. The tab delimited file looks like :

0   0   0    
0   0   0    
1   2   0    
1   0   0    
1   1   0

I want to calculate the occurrence of each unique digit for each column in the file. How do I do that using AWK or shell ?

P.S To calculate the occurrence of each unique digit in first column, i used AWK code :

awk '{h[$1]++}; END {for (k in h) print k, h[k]}' file > output-file

It gives the results as :

0  2
1  3

which means 0 occurs twice in column 1 and 1 occurs thrice in column 1.

I want to do the same for a file having over 1000 columns.

Upvotes: 1

Views: 1428

Answers (2)

pakistanprogrammerclub
pakistanprogrammerclub

Reputation: 827

With awk 4.0 2D arrays

sample input matrix of n=3 columns containing integer values

0  0  0
0  0  0
1  2  0
1  0  0
1  1  0
4  0  0
7 -1 -2

output is vector of data values in column 0 that occur in input followed by matrix of n=3 columns with count of each data value in corresponding column of input matrix

-1 0 1 0 
-2 0 0 1 
 0 2 4 6 
 1 3 1 0 
 2 0 1 0 
 4 1 0 0 
 7 1 0 0 

code

awk '
NR==1 {ncols=NF}
{for(i=1; i <=NF; ++i) ++c[$i][i-1]}
END{
  for(i in c) {
    printf("%d ", i)
    for(j=0; j < ncols; ++j) {
      printf("%d ", j in c[i]?c[i][j]: 0)
    }
    printf("\n")
  }
}
'

Upvotes: 0

John1024
John1024

Reputation: 113814

You just need to make the keys for associative array h contain both column number, i, and column value, $i:

$ awk '{for (i=1;i<=NF;i++) h[i" "$i]++}; END {for (k in h) print k, h[k]}' file | sort -n
1 0 2
1 1 3
2 0 3
2 1 1
2 2 1
3 0 5

The last line above indicates that column 3 has the value 0 occurring 5 times.

In more detail:

  • for (i=1;i<=NF;i++) h[i" "$i]++

    This loops over all columns from the first, i-=1, to the last, i=NF. For each column, it updates the counter h for that column and its value.

  • END {for (k in h) print k, h[k]}

    This prints a table of the output.

  • sort -n

    Because for (k in h) does not produce keys in any particular order, we put the output through sort.

Upvotes: 2

Related Questions