MonahAbouAlezz
MonahAbouAlezz

Reputation: 55

how to count number of lines of a specific entry under a specific pattern using awk?

I have a text file with a pattern that looks like the following

Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L

I'm trying to count how many entries are for each feature in each sample. So my desired output should look something like this:

Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

I tried using the following awk command:

$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
       END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt

But it gave me the following output

Feature 1: 6
Feature 2: 6

So I was wondering if someone can help me in modifying this command to get the desired output or suggest for me another command? (P.S the original file contains hundreds of samples and around 94 features)

Upvotes: 0

Views: 81

Answers (3)

oliv
oliv

Reputation: 13249

You could use this awk:

awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
     /^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
     {c++}
     END{print c}' file

The script increment the counter c only for lines that doesn't start with Sample or Feature.

If one of the 2 keywords are found, the counter is printed.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203229

$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/  { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
    if (cnt) {
        print name, cnt
        cnt = 0
    }
}

$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

Upvotes: 0

anubhava
anubhava

Reputation: 784998

This awk may also work:

awk '/^Sample/ {
   for (i in a)
      print i ": " a[i]
   print
   delete a
   next
}
/^Feature/ {
   f = $0
   next
}
{
   ++a[f]
}
END {
   for (i in a) 
      print i ": " a[i]
}' file

Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2

Upvotes: 0

Related Questions