Reputation: 21
I'm trying to find out how to get count of how many of these files each word occurs in. For example, I have a directory with 10 recipe texts and i want to be able to find out, for example, how many of the texts the word 'pepper' occurs with a result like, '8 pepper'.
I know how to to word counts and the like but this is a bit over my head I think, I would really appreciate some help.
For an example of the vein of what i'm talking about, this is a word count command i figured out
cat test.txt | tr '[A-Z]' '[a-z]' | tr -d '[:punct:]' | tr ' ' '\n' | sort | uniq
Upvotes: 1
Views: 162
Reputation: 11253
find -type f \
| xargs tr -c '[:alpha:]' '\n' \
| tr '[:upper]' '[:lower:]' \
| sort \
| uniq -c \
| grep pepper
This
finds all files in subdirectory;
concatenates them, replacing all that is not a letter with newline (this will produce lines with single words, and a lot of empty lines);
converts to lowercase (use of POSIX classes will preserve non-US characters);
sorts, and collapses the same word lines to produce something like
a word occurrence graph
42
16 add
9 the
8 jalapeño
8 pepper
7 lot
and filters that result to show only the line 8 pepper
.
You might want to replace or improve the tr command depending on what you expect in the files, or qualify the find to match only files with certain name template, etc.
Upvotes: 3
Reputation: 8402
Consider the following
find <directory path> -name "*pepper*" -type f |wc -l
Will list all files that has pepper and count them
Other Alternative (if you are in the directory where you recipies are)
ls -l|grep -E '*pepper*'|wc -l
Upvotes: 1
Reputation: 95375
How about grep -l
? For instance, grep -l pepper *
will list all the files that contain "pepper". grep -l pepper * | wc -l
will just tell you how many such files there are..
Upvotes: 1