Reputation: 55
I have a text file with a pattern that looks like the following
Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L
I'm trying to count how many entries are for each feature in each sample. So my desired output should look something like this:
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
I tried using the following awk command:
$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt
But it gave me the following output
Feature 1: 6
Feature 2: 6
So I was wondering if someone can help me in modifying this command to get the desired output or suggest for me another command? (P.S the original file contains hundreds of samples and around 94 features)
Upvotes: 0
Views: 81
Reputation: 13249
You could use this awk
:
awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
/^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
{c++}
END{print c}' file
The script increment the counter c
only for lines that doesn't start with Sample
or Feature
.
If one of the 2 keywords are found, the counter is printed.
Upvotes: 1
Reputation: 203229
$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/ { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
if (cnt) {
print name, cnt
cnt = 0
}
}
$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
Upvotes: 0
Reputation: 784998
This awk
may also work:
awk '/^Sample/ {
for (i in a)
print i ": " a[i]
print
delete a
next
}
/^Feature/ {
f = $0
next
}
{
++a[f]
}
END {
for (i in a)
print i ": " a[i]
}' file
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
Upvotes: 0