Reputation: 15
I have a .txt with a list of words (gene names, separated with newlines) and I want to count their occurrences in multiple files in multiple folders.
Folders are like this : MainFolder/family_ID/variants/FILE.table
One folder for each family.
I tried with grep, it does count but it outputs one line per file :
WDFY3 0
WDFY3 0
WDFY3 1
WDFY3 0
WDFY3 0
KMT2C 1
KMT2C 0
KMT2C 0
KMT2C 0
KMT2C 0
I want it that way :
WDFY3 1
KMT2C 1
Here's the code I used :
while read p; do
grep -FRchi "$p" --include \*.FILE.table | sed "s/^/$p /" >> /MyData/MainFolder/count.txt
done < /MyData/Resources/gene_list.txt
Is it possible with grep? Should I use awk/sed?
Thank you
Upvotes: 0
Views: 239
Reputation: 12877
Take the output from you script and pipe it to
awk '{ arry[$1]+=$2 } END { for (i in arry) { print i" "arry[i] } }'
Upvotes: 0
Reputation: 117298
One way is to make grep
output all the lines, sort them and then count them:
#!/bin/bash
genes=/MyData/Resources/gene_list.txt
grep -RhioFf "$genes" --include 'FILE.table' | sort | uniq -c
This will output the count in the first column and the gene in the second.
Upvotes: 1