Unknown
Unknown

Reputation: 676

Loops with grep script

I'm asking this as a new question because people didn't seem to understand my original question.

I can figure out how to find if a word starts with a capital and is followed by 9 letters with the code:

echo "word" | grep -Eo '^[A-Z][[:alpha:]]{8}'

So that's part 1 of what I'm supposed to do. My actual script is supposed to loop through each word in a text file that is given as the first and only argument, then check if any of those words start with a capital and are 9 letters long.

I've tried:

cat textfile | grep -Eo '^[A-Z][[:alpha:]]{8}'

and

while read p
do echo $p | grep -Eo '^[A-Z][[:alpha:]]{8}' 
done < $1

to no avail.

Although:

cat randomtext.txt 

outputs:

The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha

so it's correctly outputting all the words in the file randomtext.txt

then why wouldn't

cat randomtext.txt | grep -Eo '^[A-Z][[:alpha:]]{8}'

work?

Upvotes: 0

Views: 345

Answers (4)

LSerni
LSerni

Reputation: 57408

The words are all one after the other, but your grep expression refers to a whole row.

You ought to split the file into words:

sed -e 's/\s*\b\s*/\n/g' < file.txt | grep ...

Or maybe better, since you're only interested in alphanumeric sequences,

sed -e 's/\W\W*/\n/g' < file.txt | grep -E '^[A-Z][[:alpha:]]{8}$'

The $ (end of line) being made necessary because otherwise 'Supercalifragilisticexpialidocious' would match.

(I had modified {8} in {9} because you specified "and is followed by 9 letters", but then I saw you also state "and are 9 letters long")

By the way, if you use {8} and -o, you might be led into thinking a match is there where it isn't. "-o" means "only print the part matching my pattern".

So if you fed "Supercalifragilistic" to "^[A-Z][[:alpha:]]{8}", it would accept it as a match and print "Supercali". This is not what I think you asked.

Upvotes: 1

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185179

You should do this :

$ cat file.txt
The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha
$ printf '%s\n' $(<file.txt) | grep -Eo '^[A-Z][[:alpha:]]{8}$' 
Abcdefgha

If you want to work on the same source line, you need to remove the ^ character (means the beginning of the line) :

grep -Eo '\b[A-Z][[:alpha:]]{8}\b' file.txt

(added \b like choroba explains)

Upvotes: 0

iwein
iwein

Reputation: 26161

If you cat the whole line is fed to grep at once. You should split the words before feeding to grep.

You could try:

cat randomtext | awk '{ for(i=1; i <= NF; i++) {print $i } }' | grep -Eo '^[A-Z][a-z]{8}'

Upvotes: 0

choroba
choroba

Reputation: 241908

The problem is in the anchor. Your pattern starts with ^ which matches the beginning of a line, but the word you want to get returned is in the middle of a line. You can replace it with \b to match at a word boundary.

Upvotes: 2

Related Questions