user1304473
user1304473

Reputation: 39

In linux bourne shell: How to count the occurrences of a specific word in a file

By word, I mean any whitespace-delimited string.

Suppose the file test.txt has the following words delimited by spaces:

hello hello hello hell osd
hello
hello 
hello
hellojames beroo helloooohellool axnber hello
way
how 

I want to count the number of times the word hello appears in each line.

I used the command awk -F "hello" '{print NF-1}' test.txt to show the number of occurrences of the word hello in each line:

3
1
1
1
4
0
0

So it find a total of 3+1+1+1+4 = 10 occurrences.

The problem is the on fourth line: hello only occurs 1 time as a separate word; words such as hellojames and helloooohellool should not be counted because hello is not delimited by whitespace.

So I want it to find 7 occurrences of hello as a separate word.

Can you help me write a command that returns the correct total of 7 times?

Upvotes: 3

Views: 13810

Answers (7)

Otto
Otto

Reputation: 88

Only change the "needle" and the "file"

#!/usr/bin/env sh

needle="|"
file="file_example.txt"

IFS=$'\n'

counter=0
for line in `cat $file`
do
    counter=$[$counter+1]
    echo $counter"|"`echo $line | grep -o "$needle" | wc -l`
done

It will print the line number and the number of occurrences, separated by a pipe character

Upvotes: 0

Parul
Parul

Reputation: 1

cat $FileName | tr '[\040]' '[\012]' | grep $word | wc -l

This Command will change space in new line then easily you can grep that word and count number of lines those are containing given word.

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246807

grep -o '\<hello\>' filename | wc -l

The \< and \> bits are word boundary patterns, so the expression won't find foohello or hellobar.

You can also use awk -F '\\<hello\\>' ... to achieve the same effect.

Upvotes: 3

Adam Liss
Adam Liss

Reputation: 48290

Solution:

sed 's/\s\+/\n/g' test.txt | grep -w hello  | wc -l

Explanation:

sed 's/\s\+/\n/g' text.txt

This replaces every span of whitespace with a newline, effectively reformatting the file test.txt so it has one word per line. The command sed 's/FIND/REPLACE/g' replaces the FIND pattern with REPLACE everywhere it appears. The pattern \s\+ means "one or more whitespace characters", and \n is a newline.

grep -w hello

This extracts only those lines that contain hello as a complete word.

wc -l

This counts the number of lines.


If you want to count the number of occurrences per line, you can use the same technique, but process one line at a time:

while read line; do
  echo $line | sed 's/\s\+/\n/g' | grep -w hello  | wc -l
done < test.txt

Upvotes: 2

jlliagre
jlliagre

Reputation: 30813

a=$(printf "\01")
b=hello
sed -e "s/\<$b\>/ $a /g" -e "s/[^$a]//g" -e "s/$a/ $b /g" file | wc -w

Upvotes: 0

dj_segfault
dj_segfault

Reputation: 12419

for word in `cat test.txt`; do
  if [[ ${word} == hello ]]; then
    helloCount=$(( ${helloCount} + 1));
  fi;
done;

echo ${helloCount} 

Upvotes: 0

Kevin
Kevin

Reputation: 56059

awk '{ for(i=1; i<=NF; i++) if($i=="hello") c++ } END{ print c }' file.txt

If you need it to print every line:

awk '{ c=1; for(i=0; i<=NF; i++) if($i=="hello") c++; print c }'

Upvotes: 6

Related Questions