Ashi
Ashi

Reputation: 389

count the number of occurrences of a number greater than 1 in a row

I have table in the following format. There exist >500 column and 113 rows where column1 is the identifier. I want to have only those identifier for which >90% of the entry values are greater than 1 i.e for the A1 if >90% of the values is greater than 1 than i want to print the total number of entries greater than 1 in the the last column and retain it. Any suggestion please.

Id M1 M2 M3 M4 M5 M6 A1 0.82 0.73 1.40 0.52 1.84 3.20 A2 14.44 23.73 55.27 68.77 14.18 0.05 A3 5.56 5.69 10.46 10.55 7.49 7.77 A4 1.06 3.62 1.68 1.38 1.90 6.64 A5 0.01 0.00 0.03 0.01 0.00 0.07 A6 0.07 0.72 27.68 19.70 2.33 0.00 A7 5.57 8.95 18.71 6.75 16.76 33.66 A8 0.86 2.30 1.65 0.92 2.01 0.92 A9 20.21 25.59 25.86 21.62 26.75 24.66 A10 28.05 28.26 22.48 27.41 32.28 26.94 A11 0.22 0.83 7.39 5.88 2.05 9.27 A12 13.90 19.43 28.51 25.48 21.44 29.24 A13 15.43 18.39 12.49 14.75 15.79 10.85 A14 3.92 13.00 14.13 8.18 13.92 23.83 A15 0.06 0.02 0.01 0.01 0.04 0.03 A16 0.99 2.46 6.08 4.56 3.81 3.43 A17 1.31 2.05 3.18 1.73 2.80 4.12 A18 3.60 7.90 8.57 5.56 7.18 12.20 A19 44.82 47.53 37.16 42.20 41.51 26.33 A20 1.59 2.88 2.55 3.05 3.08 2.88

I have very limited knowledge. I know how to count exact match with this awk '$0=$0OFS NF-1' FS=1.40 but not for greater or less condition.

I primarily want the output in the following format, where last column indicate number of entries >1.

Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3 A2 14.44 23.73 55.27 68.77 14.18 0.05 5 A3 5.56 5.69 10.46 10.55 7.49 7.77 6 A4 1.06 3.62 1.68 1.38 1.90 6.64 6 A5 0.01 0.00 0.03 0.01 0.00 0.07 0 A6 0.07 0.72 27.68 19.70 2.33 0.00 3 A7 5.57 8.95 18.71 6.75 16.76 33.66 6 A8 0.86 2.30 1.65 0.92 2.01 0.92 3 A9 20.21 25.59 25.86 21.62 26.75 24.66 6 A10 28.05 28.26 22.48 27.41 32.28 26.94 6 A11 0.22 0.83 7.39 5.88 2.05 9.27 4 A12 13.90 19.43 28.51 25.48 21.44 29.24 6 A13 15.43 18.39 12.49 14.75 15.79 10.85 6 A14 3.92 13.00 14.13 8.18 13.92 23.83 6 A15 0.06 0.02 0.01 0.01 0.04 0.03 0 A16 0.99 2.46 6.08 4.56 3.81 3.43 5 A17 1.31 2.05 3.18 1.73 2.80 4.12 6 A18 3.60 7.90 8.57 5.56 7.18 12.20 6 A19 44.82 47.53 37.16 42.20 41.51 26.33 6 A20 1.59 2.88 2.55 3.05 3.08 2.88 6

Upvotes: 0

Views: 574

Answers (2)

James Brown
James Brown

Reputation: 37424

$ awk '{for(i=1;i<=NF;i++) {if($i+0>1) c++; printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS)}c=0}' file
Id    M1    M2    M3    M4    M5    M6
A1    0.82  0.73  1.40  0.52  1.84  3.20  3
A2    14.44 23.73 55.27 68.77 14.18 0.05  5
A3    5.56  5.69  10.46 10.55 7.49  7.77  6
A4    1.06  3.62  1.68  1.38  1.90  6.64  6
A5    0.01  0.00  0.03  0.01  0.00  0.07  0
A6    0.07  0.72  27.68 19.70 2.33  0.00  3
A7    5.57  8.95  18.71 6.75  16.76 33.66 6
A8    0.86  2.30  1.65  0.92  2.01  0.92  3
A9    20.21 25.59 25.86 21.62 26.75 24.66 6
A10   28.05 28.26 22.48 27.41 32.28 26.94 6
A11   0.22  0.83  7.39  5.88  2.05  9.27  4
A12   13.90 19.43 28.51 25.48 21.44 29.24 6
A13   15.43 18.39 12.49 14.75 15.79 10.85 6
A14   3.92  13.00 14.13 8.18  13.92 23.83 6
A15   0.06  0.02  0.01  0.01  0.04  0.03  0
A16   0.99  2.46  6.08  4.56  3.81  3.43  5
A17   1.31  2.05  3.18  1.73  2.80  4.12  6
A18   3.60  7.90  8.57  5.56  7.18  12.20 6
A19   44.82 47.53 37.16 42.20 41.51 26.33 6
A20   1.59  2.88  2.55  3.05  3.08  2.88  6

.

{
    for(i=1;i<=NF;i++) {                             # for each field
        if($i+0>1) c++                               # if field > 1, count
        printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS) # output nicely
    }
    c=0                                              # reset counter
}

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203995

$ awk 'NR>1{$0=$0"\t"NF-gsub(/^.|[[:space:]]0\./,"&")} 1' file
Id M1  M2  M3  M4  M5  M6
A1  0.82    0.73    1.40    0.52    1.84    3.20        3
A2  14.44   23.73   55.27   68.77   14.18   0.05        5
A3  5.56    5.69    10.46   10.55   7.49    7.77        6
A4  1.06    3.62    1.68    1.38    1.90    6.64        6
A5  0.01    0.00    0.03    0.01    0.00    0.07        0
A6  0.07    0.72    27.68   19.70   2.33    0.00        3
A7  5.57    8.95    18.71   6.75    16.76   33.66       6
A8  0.86    2.30    1.65    0.92    2.01    0.92        3
A9  20.21   25.59   25.86   21.62   26.75   24.66       6
A10 28.05   28.26   22.48   27.41   32.28   26.94       6
A11 0.22    0.83    7.39    5.88    2.05    9.27        4
A12 13.90   19.43   28.51   25.48   21.44   29.24       6
A13 15.43   18.39   12.49   14.75   15.79   10.85       6
A14 3.92    13.00   14.13   8.18    13.92   23.83       6
A15 0.06    0.02    0.01    0.01    0.04    0.03        0
A16 0.99    2.46    6.08    4.56    3.81    3.43        5
A17 1.31    2.05    3.18    1.73    2.80    4.12        6
A18 3.60    7.90    8.57    5.56    7.18    12.20       6
A19 44.82   47.53   37.16   42.20   41.51   26.33       6
A20 1.59    2.88    2.55    3.05    3.08    2.88        6

The gsub() returns the count of times it could match it's regexp which is the first character in the line, ^. or any numbers starting with 0. so matches counts every number on the line except numbers that start with 1. or greater. Then just subtract the gsub() return value from the total number of fields NF to get the count of numbers greater than 1 on each line.

Upvotes: 1

Related Questions