NoobEditor
NoobEditor

Reputation: 15881

Confusion in regex pattern for search

Learning regex in bash, i am trying to fetch all lines which ends with .com

Initially i did :

cat patternNpara.txt | egrep "^[[:alnum:]]+(.com)$"

why : +matches one or more occurrences, so placing it after alnum should fetch the occurrence of any digit,word or signs but apparently, this logic is failing....

Then i did this : (purely hit-and-try, not applying any logic really...) and it worked

cat patternNpara.txt | egrep "^[[:alnum:]].+(.com)$"

whats confusing me : . matches only single occurrence, then, how am i getting the output...i mean how is it really matching the pattern???

Question : whats the difference between [[:alnum:]]+ and [[:alnum:]].+ (this one has . in it) in the above matching pattern and how its working???

PS : i am looking for a possible explanation...not, try it this way thing... :)

Some test lines for the file patternNpara.txt which are fetched as output!

valid email = [email protected]
invalid email = ab@abccom
another invalid = [email protected]
1 : abc,s,[email protected]
2: [email protected]

Upvotes: 1

Views: 63

Answers (3)

Dr.Kameleon
Dr.Kameleon

Reputation: 22820

Try this (with "positive-lookahead") :

.+(?=\.com)

Demo :

http://regexr.com?38bo0

Upvotes: 0

Lee Duhem
Lee Duhem

Reputation: 15121

If you want to match any lines that end with '.com', you should use

egrep ".*\.com$" file.txt

To match all the following lines

valid email = [email protected]
invalid email = ab@abccom
another invalid = [email protected]
1 : abc,s,[email protected]
2: [email protected]

^[[:alnum:]].+(.com)$ will work, but ^[[:alnum:]]+(.com)$ will not. Here is the reasons:

  1. ^[[:alnum:]].+(.com)$ means to match strings that start with a a-zA-Z or 0-9, flows two or more any characters, and end with a 'com' (not '.com').
  2. ^[[:alnum:]]+(.com)$ means to match strings that start with one or more a-zA-Z or 0-9, flows one character that could be anything, and end with a 'com' (not '.com').

Upvotes: 1

anubhava
anubhava

Reputation: 785098

Looking at your screenshot it seems you're trying to match email address that has @ character also which is not included in your regex. You can use this regex:

egrep "[@[:alnum:]]+(\.com)" patternNpara.txt

DIfference between 2 regex:

  • [[:alnum:]] matches only [a-zA-Z0-9]. If you have @ or , then you need to include them in character class as well.
  • Your 2nd case is including .+ pattern which means 1 or more matches of ANY CHARACTER

Upvotes: 1

Related Questions