Reputation: 55

how to count number of lines of a specific entry under a specific pattern using awk?

I have a text file with a pattern that looks like the following

Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L

I'm trying to count how many entries are for each feature in each sample. So my desired output should look something like this:

Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

I tried using the following awk command:

$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
       END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt

But it gave me the following output

Feature 1: 6
Feature 2: 6

So I was wondering if someone can help me in modifying this command to get the desired output or suggest for me another command? (P.S the original file contains hundreds of samples and around 94 features)

Upvotes: 0

Answers (3)

oliv

Reputation: 13259

You could use this awk:

awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
     /^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
     {c++}
     END{print c}' file

The script increment the counter c only for lines that doesn't start with Sample or Feature.

If one of the 2 keywords are found, the counter is printed.

Upvotes: 1

Ed Morton

Reputation: 204731

$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/  { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
    if (cnt) {
        print name, cnt
        cnt = 0
    }
}

$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

Upvotes: 0

anubhava

Reputation: 786349

This awk may also work:

awk '/^Sample/ {
   for (i in a)
      print i ": " a[i]
   print
   delete a
   next
}
/^Feature/ {
   f = $0
   next
}
{
   ++a[f]
}
END {
   for (i in a) 
      print i ": " a[i]
}' file

Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2

Upvotes: 0

how to count number of lines of a specific entry under a specific pattern using awk?

Answers (3)

Related Questions