Reputation: 31
I am teaching myself commands and different ways to use grep. I know how to search for a string in a directory and its sub directories, but I am confused when it comes to searching for a split in the string.
for example: how could I search for all words (string size varies) that starts with a and ends with e. so that I could find ape or apple in text files?
EDIT UPDATE: I am not sure of the grep version I am using, but I tried using:
" grep -nr "a[A-Za-z]*e" "
this produces the answer by including outputs like ape and apple but it also includes apes which is NOT wanted.
Upvotes: 3
Views: 14126
Reputation: 71027
grep '\ba\w*e\b'
or
grep --color '\ba\w*e\b'
or
grep -rn '\ba\w*e\b'
GNU grep
: grep (GNU grep) 2.27
.The result of command man grep | grep -3 '\\b'
:
The Backslash Character and Special Expressions The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it's not at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]].
Let you show
\b
mean edge of a word\w
mean [_[:alnum:]]
a
and e
are letters*
wich mean The preceding item will be matched zero or more times. (elsewhere in same man page: man grep | grep '^ *\*'
;)... and finally... This could by written:
grep '\<a\w*e\>'
where
The symbols \< and > respectively match the empty string at the beginning and end of a word.
This could have near same effect, but description strictly correspond to title of this: grep: finding a string that starts and ends with a specific letter in directory
Upvotes: 1
Reputation: 755044
As alluded to in various comments, it is possible to do this using POSIX standard grep -E
, but it is not all that notationally convenient.
I used a script file grep-ape.sh
containing:
grep -E -e '(^|[^[:alpha:]])a[[:alpha:]]+e($|[^[:alpha:]])' "$@"
The -E
enables extended regular expressions. The -e
is optional, but allows me to add extra options as 'file names' after the regular expression. The regular expression looks for either 'start of line' or a non-alpha character, followed by a
, one or more additional alpha characters, an e
and either 'end of line' or a non-alpha character.
Given the data file (called, unimaginatively, data
):
I want to tape the apes that ate the grapes.
ape at the start.
Ending with ape
Situating ape in the middle
And an apple too.
But not apples, no way.
The tape ran out.
The apes ran out.
The grapes ran out.
They ate them.
I could run grep-ape.sh -n data
(demonstrating the usefulness of the -e
option, though GNU systems will permute options so you don't necessarily spot the problem), and got:
1:I want to tape the apes that ate the grapes.
2:ape at the start.
3:Ending with ape
4:Situating ape in the middle
5:And an apple too.
10:They ate them.
Using a non-POSIX option -o
(supported by GNU and BSD versions of grep
) to print only what is matched, I can get the output:
$ grep-ape.sh -n -o data
1: ate
2:ape
3: ape
4: ape
5: apple
10: ate
$
This shows that the regular expression is picking up the acceptable words, even on lines where there are words that would not be acceptable when not in the company of words that are acceptable.
Upvotes: 0
Reputation: 110
I suppose you could use:
find . -type f -name '*.txt' -exec cat {} \; | grep 'a[A-Za-z]\+e'
That should cat any .txt files in the current directory, recursively, and grep for "a... any characters... e"
The [A-Za-z]
searches for a character of either case, the \+
says "any number of them".
I think that's what you're after?
Edit:
Word boundaries:
find . -type f -name '*.txt' -exec cat {} \+ | grep '\ba[A-Za-z]\+e\b'
Upvotes: 0