Cnvrsn
Cnvrsn

Reputation: 21

Bash code word occurance counting within a directory of texts

I'm trying to find out how to get count of how many of these files each word occurs in. For example, I have a directory with 10 recipe texts and i want to be able to find out, for example, how many of the texts the word 'pepper' occurs with a result like, '8 pepper'.

I know how to to word counts and the like but this is a bit over my head I think, I would really appreciate some help.

For an example of the vein of what i'm talking about, this is a word count command i figured out

cat test.txt | tr '[A-Z]' '[a-z]' | tr -d '[:punct:]' | tr ' ' '\n' | sort | uniq 

Upvotes: 1

Views: 162

Answers (3)

Alois Mahdal
Alois Mahdal

Reputation: 11253

find -type f  \
  | xargs tr  -c '[:alpha:]' '\n' \
  | tr '[:upper]' '[:lower:]' \
  | sort \
  | uniq -c \
  | grep pepper

This

  1. finds all files in subdirectory;

  2. concatenates them, replacing all that is not a letter with newline (this will produce lines with single words, and a lot of empty lines);

  3. converts to lowercase (use of POSIX classes will preserve non-US characters);

  4. sorts, and collapses the same word lines to produce something like a word occurrence graph

    42 
    16 add
    9 the
    8 jalapeño
    8 pepper
    7 lot
    
  5. and filters that result to show only the line 8 pepper.

You might want to replace or improve the tr command depending on what you expect in the files, or qualify the find to match only files with certain name template, etc.

Upvotes: 3

repzero
repzero

Reputation: 8402

Consider the following

 find <directory path>  -name "*pepper*" -type f  |wc -l

Will list all files that has pepper and count them

Other Alternative (if you are in the directory where you recipies are)

ls -l|grep -E '*pepper*'|wc -l

Upvotes: 1

Mark Reed
Mark Reed

Reputation: 95375

How about grep -l? For instance, grep -l pepper * will list all the files that contain "pepper". grep -l pepper * | wc -l will just tell you how many such files there are..

Upvotes: 1

Related Questions