Reputation: 142
I'm trying to count the words with at least two vowels in all the .txt files in the directory. Here's my code so far:
#!/bin/bash
wordcount=0
for i in $HOME/*.txt
do
cat $i |
while read line
do
for w in $line
do
if [[ $w == .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
then
wordcount=`expr $wordcount + 1`
echo $w ':' $wordcount
else
echo "In else"
fi
done
done
echo $i ':' $wordcount
wordcount=0
done
Here is my sample from a txt file
Last modified: Sun Aug 20 18:18:27 IST 2017
To remove PPAs
sudo apt-get install ppa-purge
sudo ppa-purge ppa:
The problem is it doesn't match the pattern in the if statement for all the words in the text file. It goes directly to the else statement. And secondly, the wordcount in echo $i ':' $wordcount is equal to 0 which should be some value.
Upvotes: 4
Views: 11720
Reputation: 6330
Using grep - this is pretty simple to do.
#!/bin/bash
wordcount=0
for file in ./*.txt
do
count=`cat $file | xargs -n1 | grep -ie "[aeiou].*[aeiou]" | wc -l`
wordcount=`expr $wordcount + $count`
done
echo $wordcount
Upvotes: 0
Reputation: 295845
[[ $string = $pattern ]]
doesn't perform regex matching; instead, it's a glob-style pattern match. While .
means "any character" in regex, it matches only itself in glob.
You have a few options here:
Use =~
instead to perform regular expression matching:
[[ $w =~ .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
Use a glob-style expression instead of a regex:
[[ $w = *[aeiouAEIOU]*[aeiouAEIOU]* ]]
Note the use of =
rather than ==
here; while either is technically valid, the former avoids building finger memory that would lead to bugs when writing code for a POSIX implementation of test
/ [
, as =
is the only valid string comparison operator there.
Using for w in $line
is innately unsafe. Use read -a
to read a line into an array of words:
#!/usr/bin/env bash
wordcount=0
for i in "$HOME"/*.txt; do
while read -r -a words; do
for word in "${words[@]}"; do
if [[ $word = *[aeiouAEIOU]*[aeiouAEIOU]* ]]; then
(( ++wordcount ))
fi
done
done <"$i"
printf '%s: %s\n' "$i" "$wordcount"
wordcount=0
done
Upvotes: 7
Reputation: 113994
Try:
awk '/[aeiouAEIOU].*[AEIOUaeiou]/{n++} ENDFILE{print FILENAME":"n; n=0}' RS='[[:space:]]' *.txt
Sample output looks like:
$ awk '/[aeiouAEIOU].*[AEIOUaeiou]/{n++} ENDFILE{print FILENAME":"n; n=0}' RS='[[:space:]]' *.txt
one.txt:1
sample.txt:9
How it works:
/[aeiouAEIOU].*[AEIOUaeiou]/{n++}
Every time we find a word with two vowels, we increment variable n
.
ENDFILE{print FILENAME":"n; n=0}
At the end of each file, we print the name of the file and the 2-vowel word count n
. We then reset n
to zero.
RS='[[:space:]]'
This tells awk to use any whitespace as a word separator. This makes each word into a record. Awk reads the input one record at a time.
The use of awk avoids a multitude of shell issues. For example, consider the line for w in $line
. This will not work the way you hope. Consider a directory with these files:
$ ls
one.txt sample.txt
Now, let's take line='* Item One'
and see what happens:
$ line='* Item One'
$ for w in $line; do echo "w=$w"; done
w=one.txt
w=sample.txt
w=Item
w=One
The shell treats the *
in line
as a wildcard and expands it into a list of files. Odds are you didn't want this. The awk solution avoids a variety of issues like this.
Upvotes: 1