bopie
bopie

Reputation: 67

use bash count every word's occurrence in a file

i want to count every word's occurrence in a file but the result is wrong.

#!/bin/bash
#usage: count.sh file

declare -a dict

for word in $(cat $1)
do
    if [ ${dict[$word]} == "" ] ;then
        dict[$word]=0
    else
        dict[$word]=$[${dict[$word]} + 1]
    fi
done

for word in ${!dict[@]}
do
    echo $word: ${dict[$word]}
done

use the test file below:

learning the bash shell
this is second line
this is the last line

bash -x count.sh file get the result:

+ declare -a dict
++ cat book
+ for word in '$(cat $1)'
+ '[' '' == '' ']'
+ dict[$word]=0
+ for word in '$(cat $1)'
+ '[' 0 == '' ']'
+ dict[$word]=1
+ for word in '$(cat $1)'
+ '[' 1 == '' ']'
+ dict[$word]=2
+ for word in '$(cat $1)'
+ '[' 2 == '' ']'
+ dict[$word]=3
+ for word in '$(cat $1)'
+ '[' 3 == '' ']'
+ dict[$word]=4
+ for word in '$(cat $1)'
+ '[' 4 == '' ']'
+ dict[$word]=5
+ for word in '$(cat $1)'
+ '[' 5 == '' ']'
+ dict[$word]=6
+ for word in '$(cat $1)'
+ '[' 6 == '' ']'
+ dict[$word]=7
+ for word in '$(cat $1)'
+ '[' 7 == '' ']'
+ dict[$word]=8
+ for word in '$(cat $1)'
+ '[' 8 == '' ']'
+ dict[$word]=9
+ for word in '$(cat $1)'
+ '[' 9 == '' ']'
+ dict[$word]=10
+ for word in '$(cat $1)'
+ '[' 10 == '' ']'
+ dict[$word]=11
+ for word in '$(cat $1)'
+ '[' 11 == '' ']'
+ dict[$word]=12
+ for word in '${!dict[@]}'
+ echo 0: 12 0: 12

Upvotes: 2

Views: 1840

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295291

Using declare -a dict means that each key is being evaluated to a numeric value, which is then used as an index. That's not what you want, if you're storing things by words. Use declare -A instead.


Also, $[ ] is an exceedingly outdated syntax for math. Even modern POSIX sh supports $(( )), which you should use instead:

dict[$word]=$(( ${dict[$word]} + 1 ))

or, to take advantage of bash-only math syntax:

(( dict[$word]++ ))

Also, using for word in $(cat $1) is broken in several ways:

  • It doesn't quote $1, so for a filename with spaces, it will split the name into several words and try to open each word as a separate file. To fix only this, you would use $(cat "$1") or $(<"$1") (which is more efficient, as it doesn't require starting the external program cat).
  • It tries to expand the words in the file as globs -- if the file contains *, every file in the current directory will be treated as a word.

Instead, use a while loop:

while read -r -d' ' word; do
  if [[ -n ${dict[$word]} ]] ; then
    dict[$word]=$(( ${dict[$word]} + 1 ))
  else
    dict[$word]=1
  fi
done <"$1"

Upvotes: 3

Related Questions