Reputation: 705
I have been trying to make the scripts I write simpler and simpler.
There are numerous ways to write get the word count of all files in a folder, or even all files of subdirectories of a folder.
For instance, I could write
wc */*
and I might get output like this (this is the desired output):
0 0 0 10.53400000/YRI.GS000018623.NONSENSE.vcf
0 0 0 10.53400000/YRI.GS000018623.NONSTOP.vcf
0 0 0 10.53400000/YRI.GS000018623.PFAM.vcf
0 0 0 10.53400000/YRI.GS000018623.SPAN.vcf
0 0 0 10.53400000/YRI.GS000018623.SVLEN.vcf
2 20 624 10.53400000/YRI.GS000018623.SVTYPE.vcf
2 20 676 10.53400000/YRI.GS000018623.SYNONYMOUS.vcf
13 130 4435 10.53400000/YRI.GS000018623.TSS-UPSTREAM.vcf
425 4250 126381 10.53400000/YRI.GS000018623.UNKNOWN-INC.vcf
but if there are too many files, I might get an error message like this:
-bash: /usr/bin/wc: Argument list too long
so, I could make a variable and do one folder at a time, like so:
while read $FOLDER
do
wc $FOLDER/* >> outfile.txt
done < "$FOLDER_LIST"
so this goes from one line to 5 just like that.
Further, in one case, I want to use grep -v
first, then carryout the word counting, like so:
grep -v dbsnp */* | wc
but this would suffer from two errors:
So, to recap, I would love to be able to do this:
grep -v dbsnp */* wc > Outfile.txt
awk '{print $4,$1} Outfile.txt > Outfile.summary.txt
and have it return output like I showed above.
Is there a very simple way to do this? Or I am looking at a loop at minimum? Again, I know 101 ways to do this just like the rest of us using a 4-10 line script, but I would love to be able to just type 2 one liners into the command prompt...and my knowledge of the shell is not yet deep enough to know which ways would allow what I am asking of the OS.
EDIT -
A solution was proposed:
find -exec grep -v dbsnp {} \; | xargs -n 1 wc
This solution leads to the following output:
wc: 1|0:53458644:AMBIGUOUS:CCAGGGC|-16&GCCAGGGCCAGGGC|-18&GCCAGGGCC|-19&GGCCAGGGC|-19&GCCAGGGCG|-19,.:48:48,48:4,4:0,17:-48,0,-48:0,0,-17:27:3,24:24: No such file or directory
wc: 10: No such file or directory
wc: 53460829: No such file or directory
wc: .: Is a directory
0 0 0 .
wc: AA: No such file or directory
wc: CT: No such file or directory
wc: .: Is a directory
0 0 0 .
wc: .: Is a directory
0 0 0 .
As nearly as I can tell, appears to be treating each line as a file. I am still reviewing the other answers, and thanks for your help.
Upvotes: 6
Views: 2255
Reputation: 3966
Based on perreal's answer:
If you want the wc
file by file, you could use xargs
:
find -exec grep -v dbsnp {} \; | xargs -n 1 wc
xargs
can read the standard input and build and execute command lines with it. So it reads the result of your input stream and executes wc
for each single item (-n 1
).
Upvotes: 0
Reputation: 2160
You mentioned that "this does not solve the problem of returning the wc in an item-by-item fashion"
Following will:
find -exec wc {} \;
But this won't come with your grep
filter "grep -v"
If you intend to do the same as indicated by my comment on this answer, then please check if following works for you:
find -exec bash -c "echo -n {}; grep -v dbsnp {} | wc " \;
Upvotes: 3
Reputation: 1291
This works for me:
grep -or "[a-zA-Z]*" * | cut -d":" -f2 | sort | uniq -c
What you're looking is MapReduce algorithm http://en.wikipedia.org/wiki/MapReduce
Upvotes: 0
Reputation: 98068
You have too many matches to the */*
so grep receives a long argument list. You can use find
to circumvent this:
find -exec grep -v dbsnp {} \; | wc
and perhaps you want to get rid of possible traversal errors too:
find -exec grep -v dbsnp {} \; 2> /dev/null | wc
Upvotes: 2