Emil Cioran
Emil Cioran

Reputation: 15

Count occurrence of list of words in multiple files

I have a .txt with a list of words (gene names, separated with newlines) and I want to count their occurrences in multiple files in multiple folders.

Folders are like this : MainFolder/family_ID/variants/FILE.table

One folder for each family.

I tried with grep, it does count but it outputs one line per file :

WDFY3 0
WDFY3 0
WDFY3 1
WDFY3 0
WDFY3 0
KMT2C 1
KMT2C 0
KMT2C 0
KMT2C 0
KMT2C 0

I want it that way :

WDFY3 1
KMT2C 1

Here's the code I used :

while read p; do
    grep -FRchi "$p" --include \*.FILE.table | sed "s/^/$p /" >> /MyData/MainFolder/count.txt
done < /MyData/Resources/gene_list.txt

Is it possible with grep? Should I use awk/sed?

Thank you

Upvotes: 0

Views: 239

Answers (2)

Raman Sailopal
Raman Sailopal

Reputation: 12877

Take the output from you script and pipe it to

awk '{ arry[$1]+=$2 } END { for (i in arry) { print i" "arry[i] } }' 

Upvotes: 0

Ted Lyngmo
Ted Lyngmo

Reputation: 117298

One way is to make grep output all the lines, sort them and then count them:

#!/bin/bash

genes=/MyData/Resources/gene_list.txt

grep -RhioFf "$genes" --include 'FILE.table' | sort | uniq -c

This will output the count in the first column and the gene in the second.

Upvotes: 1

Related Questions