dev3569
dev3569

Reputation: 31

grep: finding a string that starts and ends with a specific letter in directory

I am teaching myself commands and different ways to use grep. I know how to search for a string in a directory and its sub directories, but I am confused when it comes to searching for a split in the string.

for example: how could I search for all words (string size varies) that starts with a and ends with e. so that I could find ape or apple in text files?

EDIT UPDATE: I am not sure of the grep version I am using, but I tried using:

" grep -nr "a[A-Za-z]*e" "

this produces the answer by including outputs like ape and apple but it also includes apes which is NOT wanted.

Upvotes: 3

Views: 14126

Answers (3)

F. Hauri  - Give Up GitHub
F. Hauri - Give Up GitHub

Reputation: 71027

Simply:

grep '\ba\w*e\b' 

or

grep --color '\ba\w*e\b'

or

grep -rn '\ba\w*e\b'

Some explanations

  • As this question is tagged , this answer use GNU grep: grep (GNU grep) 2.27.
  • The result of command man grep | grep -3 '\\b':

    The Backslash Character and Special Expressions
        The  symbols  \<  and  \>  respectively  match  the empty string at the
        beginning and end of a word.  The symbol \b matches the empty string at
        the  edge  of a word, and \B matches the empty string provided it's not
        at the edge of a word.  The symbol \w is a synonym for [_[:alnum:]] and
        \W is a synonym for [^_[:alnum:]].
    

    Let you show

    • \b mean edge of a word
    • \w mean [_[:alnum:]]
    • a and e are letters
    • you may already know* wich mean The preceding item will be matched zero or more times. (elsewhere in same man page: man grep | grep '^ *\*' ;)
    • ... and finally... This could by written:

      grep '\<a\w*e\>'
      

      where

      The symbols \< and > respectively match the empty string at the beginning and end of a word.

      This could have near same effect, but description strictly correspond to title of this: grep: finding a string that starts and ends with a specific letter in directory

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 755044

As alluded to in various comments, it is possible to do this using POSIX standard grep -E, but it is not all that notationally convenient.

I used a script file grep-ape.sh containing:

grep -E -e '(^|[^[:alpha:]])a[[:alpha:]]+e($|[^[:alpha:]])' "$@"

The -E enables extended regular expressions. The -e is optional, but allows me to add extra options as 'file names' after the regular expression. The regular expression looks for either 'start of line' or a non-alpha character, followed by a, one or more additional alpha characters, an e and either 'end of line' or a non-alpha character.

Given the data file (called, unimaginatively, data):

I want to tape the apes that ate the grapes.
ape at the start.
Ending with ape
Situating ape in the middle
And an apple too.
But not apples, no way.
The tape ran out.
The apes ran out.
The grapes ran out.
They ate them.

I could run grep-ape.sh -n data (demonstrating the usefulness of the -e option, though GNU systems will permute options so you don't necessarily spot the problem), and got:

1:I want to tape the apes that ate the grapes.
2:ape at the start.
3:Ending with ape
4:Situating ape in the middle
5:And an apple too.
10:They ate them.

Using a non-POSIX option -o (supported by GNU and BSD versions of grep) to print only what is matched, I can get the output:

$ grep-ape.sh -n -o data
1: ate 
2:ape 
3: ape
4: ape 
5: apple 
10: ate 
$

This shows that the regular expression is picking up the acceptable words, even on lines where there are words that would not be acceptable when not in the company of words that are acceptable.

Upvotes: 0

BodneyC
BodneyC

Reputation: 110

I suppose you could use:

find . -type f -name '*.txt' -exec cat {} \; | grep 'a[A-Za-z]\+e'

That should cat any .txt files in the current directory, recursively, and grep for "a... any characters... e"

The [A-Za-z] searches for a character of either case, the \+ says "any number of them".

I think that's what you're after?

Edit:

Word boundaries:

find . -type f -name '*.txt' -exec cat {} \+ | grep '\ba[A-Za-z]\+e\b'

Upvotes: 0

Related Questions