Reputation: 169
I have a coma delemiated file. I am interested to count the number of rows (column count/length) in each column with their header name.
Example Dataset:
ID, IB, IM, IZ
0.05, 0.02, 0.01, 0.09
0.06, 0.01, , 0.08
0.02, 0.06,
Coumn ID:3
Column IB:3
Column IM: 1
Column IZ:2
I have tried quite few option:
I can split these columns into seperate files and then can count number of lines in each file using wc -l File_name command.
This Command is very close to what I am interested in but stillunable to get header name. Any help will be highly appreciated.
Upvotes: 0
Views: 115
Reputation: 203665
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=" *, *" }
NR == 1 {
numCols = split($0,tags)
next
}
{
for ( i=1; i<=NF; i++ ) {
if ( $i ~ /./ ) {
cnt[i]++
}
}
}
END {
for ( i=1; i<=numCols; i++ ) {
printf "Column %s:%d\n", tags[i], cnt[i]
}
}
$ awk -f tst.awk file
Column ID:3
Column IB:3
Column IM:1
Column IZ:2
Upvotes: 0
Reputation: 36520
I would use GNU AWK
for this task following way, let file.txt
content be
ID, IB, IM, IZ
0.05, 0.02, 0.01, 0.09
0.06, 0.01, , 0.08
0.02, 0.06,
then
awk 'BEGIN{FS=","}NR==1{split($0,names);next}{for(i=1;i<=NF;i+=1){counts[i]+=$i~/[^[:space:]]/}}END{for(i=1;i<=length(names);i+=1){print "Column",names[i]": "counts[i]}}' file.txt
output
Column ID: 3
Column IB: 3
Column IM: 1
Column IZ: 2
Explanation: I inform GNU AWK
that ,
is field separator, when processing 1st record split whole lines ($0
) into array names
, so ID
becomes names[1]
, IB
becomes names[2]
, IM
becomes names[3]
and so on. After doing that go to next
line. For all but 1st line iterate over columns using for
loop, for every line increase value of counts[i]
(where i
is number of column) by does that column contain non-whitespace character? which is 0
for false and 1
for true. In other words increase by 1 if non-whitespace character found else increase by 0. After processing all lines iterate over names
and print name with corresponding value of counts
.
(tested in gawk 4.2.1)
Upvotes: 1
Reputation: 29202
With GNU awk
:
awk -F'[[:space:]]*,[[:space:]]*' 'NR == 1 {header = $0; next} \
{for(i = 1; i <= NF; i++) n[i] += ($i ~ /\S/)} \
END {$0 = header; for(i = 1; i <= NF; i++) print "Column " $i ": " n[i]}' file
Column ID: 3
Column IB: 3
Column IM: 2
Column IZ: 1
Upvotes: 1