Parth Parikh
Parth Parikh

Reputation: 260

how to count occurrence of specific word in group of file by bash/shellscript

i have two text files 'simple' and 'simple1' with following data in them

    simple.txt--

    hello
    hi hi hello
    this
    is it

    simple1.txt--
    hello hi
    how are you



[]$ tr ' ' '\n' < simple.txt | grep  -i -c '\bh\w*'
4
[]$ tr ' ' '\n' < simple1.txt | grep  -i -c '\bh\w*'
3

this commands show the number of words that start with "h" for each file but i want to display the total count to be 7 i.e. total of both file. Can i do this in single command/shell script?

P.S.: I had to write two commands as tr does not take two file names.

Upvotes: 3

Views: 561

Answers (3)

user1934428
user1934428

Reputation: 22217

It is not the case, that tr accepts only one filename, it does not accept any filename (and always reads from stdin). That's why even in your solution, you didn't provide a filename for tr, but used input redirection.

In your case, I think you can replace tr by fmt, which does accept filenames:

fmt -1 simple.txt simple1.txt | grep -i -c -w 'h.*'

(I also changed the grep a bit, because I personally find it better readable this way, but this is a matter of taste).

Note that both solutions (mine and your original ones) would count a string consisting of letters and one or more non-space characters - for instance the string haaaa.hbbbbbb.hccccc - as a "single block", i.e. it would only add 1 to the count of "h"-words, not 3. Whether or not this is the desired behaviour, it's up to you to decide.

Upvotes: 0

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185151

Try this, the straightforward way :

cat simple.txt simple1.txt | tr ' ' '\n' | grep  -i -c '\bh\w*'

Upvotes: 4

John1024
John1024

Reputation: 113844

This alternative requires no pipelines:

$ awk -v RS='[[:space:]]+' '/^h/{i++} END{print i+0}' simple.txt simple1.txt
7

How it works

  • -v RS='[[:space:]]+'

    This tells awk to treat each word as a record.

  • /^h/{i++}

    For any record (word) that starts with h, we increment variable i by 1.

  • END{print i+0}

    After we have finished reading all the files, we print out the value of i.

Upvotes: 2

Related Questions