James Smith
James Smith

Reputation: 11

Bash Shell Script For Word Counting

For some time now, I've been trying to write a bash script, which should read a *.txt file and output word occurrences. I had no luck so far. I know the algorithm, the only problem is the syntax.

How should this script work?

When I type ./myScript.sh myTextFile.txt in a terminal, it should output all the word occurrences in a sorted order from largest to smallest in percentage like this:

17 is 7.1%  
12 all 6.4%  
10 house 5.5%  
5 tree 3.7%  

....................and soo on.

If I put a switch ./myScript.sh -x 3 myTextFile.txt it should only output the words longer than 3 characters.

If I put a switch ./myScript.sh -y 4 myTextFile.txt it should only output the words who occur 4 times or more. Here I had a lot of problems on how to determine which switches were used and what value are they holding.

And of course, if I put a file that does not exist or a wrong switch the script should threw an error.

Thank you for all your help.

Upvotes: 1

Views: 309

Answers (1)

oliv
oliv

Reputation: 13249

You can use awk to get the word count:

 awk '{for(i=1;i<=NF;i++){a[$i]++;tot++}}END{for(j in a) {printf("%s %s %2.1f%\n",a[j],j,a[j]/tot*100)}}' myTextFile.txt | sort -g 

This awk command fills the array a[] with all words, and their index count.

tot is the total number of words encountered.

The END statement loops through the array and show the count, word, and percentage.

sort -g is performing a numerical sort based on the count number.

Upvotes: 1

Related Questions