Reputation: 11
For some time now, I've been trying to write a bash script, which should read a *.txt file and output word occurrences. I had no luck so far. I know the algorithm, the only problem is the syntax.
How should this script work?
When I type ./myScript.sh myTextFile.txt
in a terminal, it should output all the word occurrences in a sorted order from largest to smallest in percentage like this:
17 is 7.1%
12 all 6.4%
10 house 5.5%
5 tree 3.7%
....................and soo on.
If I put a switch ./myScript.sh -x 3 myTextFile.txt
it should only output the words longer than 3 characters.
If I put a switch ./myScript.sh -y 4 myTextFile.txt
it should only output the words who occur 4 times or more. Here I had a lot of problems on how to determine which switches were used and what value are they holding.
And of course, if I put a file that does not exist or a wrong switch the script should threw an error.
Thank you for all your help.
Upvotes: 1
Views: 309
Reputation: 13249
You can use awk
to get the word count:
awk '{for(i=1;i<=NF;i++){a[$i]++;tot++}}END{for(j in a) {printf("%s %s %2.1f%\n",a[j],j,a[j]/tot*100)}}' myTextFile.txt | sort -g
This awk
command fills the array a[]
with all words, and their index count.
tot
is the total number of words encountered.
The END
statement loops through the array and show the count, word, and percentage.
sort -g
is performing a numerical sort based on the count number.
Upvotes: 1