Reputation: 676
I'm asking this as a new question because people didn't seem to understand my original question.
I can figure out how to find if a word starts with a capital and is followed by 9 letters with the code:
echo "word" | grep -Eo '^[A-Z][[:alpha:]]{8}'
So that's part 1 of what I'm supposed to do. My actual script is supposed to loop through each word in a text file that is given as the first and only argument, then check if any of those words start with a capital and are 9 letters long.
I've tried:
cat textfile | grep -Eo '^[A-Z][[:alpha:]]{8}'
and
while read p
do echo $p | grep -Eo '^[A-Z][[:alpha:]]{8}'
done < $1
to no avail.
Although:
cat randomtext.txt
outputs:
The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha
so it's correctly outputting all the words in the file randomtext.txt
then why wouldn't
cat randomtext.txt | grep -Eo '^[A-Z][[:alpha:]]{8}'
work?
Upvotes: 0
Views: 345
Reputation: 57408
The words are all one after the other, but your grep
expression refers to a whole row.
You ought to split the file into words:
sed -e 's/\s*\b\s*/\n/g' < file.txt | grep ...
Or maybe better, since you're only interested in alphanumeric sequences,
sed -e 's/\W\W*/\n/g' < file.txt | grep -E '^[A-Z][[:alpha:]]{8}$'
The $ (end of line) being made necessary because otherwise 'Supercalifragilisticexpialidocious' would match.
(I had modified {8} in {9} because you specified "and is followed by 9 letters", but then I saw you also state "and are 9 letters long")
By the way, if you use {8} and -o, you might be led into thinking a match is there where it isn't. "-o" means "only print the part matching my pattern".
So if you fed "Supercalifragilistic" to "^[A-Z][[:alpha:]]{8}", it would accept it as a match and print "Supercali". This is not what I think you asked.
Upvotes: 1
Reputation: 185179
You should do this :
$ cat file.txt
The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha
$ printf '%s\n' $(<file.txt) | grep -Eo '^[A-Z][[:alpha:]]{8}$'
Abcdefgha
If you want to work on the same source line, you need to remove the ^
character (means the beginning of the line) :
grep -Eo '\b[A-Z][[:alpha:]]{8}\b' file.txt
(added \b
like choroba explains)
Upvotes: 0
Reputation: 26161
If you cat the whole line is fed to grep at once. You should split the words before feeding to grep.
You could try:
cat randomtext | awk '{ for(i=1; i <= NF; i++) {print $i } }' | grep -Eo '^[A-Z][a-z]{8}'
Upvotes: 0
Reputation: 241908
The problem is in the anchor. Your pattern starts with ^
which matches the beginning of a line, but the word you want to get returned is in the middle of a line. You can replace it with \b
to match at a word boundary.
Upvotes: 2