Reputation: 389
I have table in the following format. There exist >500 column and 113 rows where column1 is the identifier. I want to have only those identifier for which >90% of the entry values are greater than 1 i.e for the A1 if >90% of the values is greater than 1 than i want to print the total number of entries greater than 1 in the the last column and retain it. Any suggestion please.
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20
A2 14.44 23.73 55.27 68.77 14.18 0.05
A3 5.56 5.69 10.46 10.55 7.49 7.77
A4 1.06 3.62 1.68 1.38 1.90 6.64
A5 0.01 0.00 0.03 0.01 0.00 0.07
A6 0.07 0.72 27.68 19.70 2.33 0.00
A7 5.57 8.95 18.71 6.75 16.76 33.66
A8 0.86 2.30 1.65 0.92 2.01 0.92
A9 20.21 25.59 25.86 21.62 26.75 24.66
A10 28.05 28.26 22.48 27.41 32.28 26.94
A11 0.22 0.83 7.39 5.88 2.05 9.27
A12 13.90 19.43 28.51 25.48 21.44 29.24
A13 15.43 18.39 12.49 14.75 15.79 10.85
A14 3.92 13.00 14.13 8.18 13.92 23.83
A15 0.06 0.02 0.01 0.01 0.04 0.03
A16 0.99 2.46 6.08 4.56 3.81 3.43
A17 1.31 2.05 3.18 1.73 2.80 4.12
A18 3.60 7.90 8.57 5.56 7.18 12.20
A19 44.82 47.53 37.16 42.20 41.51 26.33
A20 1.59 2.88 2.55 3.05 3.08 2.88
I have very limited knowledge. I know how to count exact match with this awk '$0=$0OFS NF-1' FS=1.40
but not for greater or less condition.
I primarily want the output in the following format, where last column indicate number of entries >1.
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
Upvotes: 0
Views: 574
Reputation: 37424
$ awk '{for(i=1;i<=NF;i++) {if($i+0>1) c++; printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS)}c=0}' file
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
.
{
for(i=1;i<=NF;i++) { # for each field
if($i+0>1) c++ # if field > 1, count
printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS) # output nicely
}
c=0 # reset counter
}
Upvotes: 1
Reputation: 203995
$ awk 'NR>1{$0=$0"\t"NF-gsub(/^.|[[:space:]]0\./,"&")} 1' file
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
The gsub()
returns the count of times it could match it's regexp which is the first character in the line, ^.
or any numbers starting with 0.
so matches counts every number on the line except numbers that start with 1.
or greater. Then just subtract the gsub()
return value from the total number of fields NF
to get the count of numbers greater than 1 on each line.
Upvotes: 1