Waqas Khokhar
Waqas Khokhar

Reputation: 169

Print each column count alongwith header name

I have a coma delemiated file. I am interested to count the number of rows (column count/length) in each column with their header name.

Example Dataset:

ID, IB, IM, IZ
0.05, 0.02, 0.01, 0.09
0.06, 0.01, , 0.08
0.02, 0.06,

Coumn ID:3

Column IB:3

Column IM: 1

Column IZ:2

I have tried quite few option:

I can split these columns into seperate files and then can count number of lines in each file using wc -l File_name command.

This Command is very close to what I am interested in but stillunable to get header name. Any help will be highly appreciated.

Upvotes: 0

Views: 115

Answers (3)

Ed Morton
Ed Morton

Reputation: 203665

Using any awk in any shell on every Unix box:

$ cat tst.awk
BEGIN { FS=" *, *" }
NR == 1 {
    numCols = split($0,tags)
    next
}
{
    for ( i=1; i<=NF; i++ ) {
        if ( $i ~ /./ ) {
            cnt[i]++
        }
    }
}
END {
    for ( i=1; i<=numCols; i++ ) {
        printf "Column %s:%d\n", tags[i], cnt[i]
    }
}

$ awk -f tst.awk file
Column ID:3
Column IB:3
Column IM:1
Column IZ:2

Upvotes: 0

Daweo
Daweo

Reputation: 36520

I would use GNU AWK for this task following way, let file.txt content be

ID, IB, IM, IZ
0.05, 0.02, 0.01, 0.09
0.06, 0.01, , 0.08
0.02, 0.06,

then

awk 'BEGIN{FS=","}NR==1{split($0,names);next}{for(i=1;i<=NF;i+=1){counts[i]+=$i~/[^[:space:]]/}}END{for(i=1;i<=length(names);i+=1){print "Column",names[i]": "counts[i]}}' file.txt

output

Column ID: 3
Column  IB: 3
Column  IM: 1
Column  IZ: 2

Explanation: I inform GNU AWK that , is field separator, when processing 1st record split whole lines ($0) into array names, so ID becomes names[1], IB becomes names[2], IM becomes names[3] and so on. After doing that go to next line. For all but 1st line iterate over columns using for loop, for every line increase value of counts[i] (where i is number of column) by does that column contain non-whitespace character? which is 0 for false and 1 for true. In other words increase by 1 if non-whitespace character found else increase by 0. After processing all lines iterate over names and print name with corresponding value of counts.

(tested in gawk 4.2.1)

Upvotes: 1

Renaud Pacalet
Renaud Pacalet

Reputation: 29202

With GNU awk:

awk -F'[[:space:]]*,[[:space:]]*' 'NR == 1 {header = $0; next} \
  {for(i = 1; i <= NF; i++) n[i] += ($i ~ /\S/)} \
  END {$0 = header; for(i = 1; i <= NF; i++) print "Column " $i ": " n[i]}' file
Column ID: 3
Column IB: 3
Column IM: 2
Column IZ: 1

Upvotes: 1

Related Questions