sodiumnitrate
sodiumnitrate

Reputation: 3131

Regex in python not matching what grep does

I have the following string:

string1; 1.8w/v PEG_8000; string2; ;;

I want to get the ; 1.8w/v PEG_8000; part. I tried the following:

a =re.search(';[^.;]+PEG[^.;]+;','string1; 1.8w/v PEG_8000; string2; ;;'

which returns a = None.

What am I missing?

(OS X Yosemite, Python 2.7)

Edit: I previously said the following, which I discovered not to be true. I forgot that I edited the string before I tried this.

The funny thing is, if I do grep -E --color ';[^.;]+PEG[^.;]+;' file, where file contains the string, it can highlight it.

Edit 2: I have a huge file with such strings, where the keyword PEG does not necessarily appear in the second field. That is why I don't use split(';').

Upvotes: 1

Views: 102

Answers (3)

Avinash Raj
Avinash Raj

Reputation: 174726

You need to remove the dot from the first character class. Because there is a dot exists between the semicolon and the sub-string PEG which causes the regex to fail. Note that dot present inside a character class matches only a literal dot.

>>> re.search(r';[^;]+PEG[^.;]+;','string1; 1.8w/v PEG_8000; string2; ;;').group()
'; 1.8w/v PEG_8000;'

Upvotes: 1

hwnd
hwnd

Reputation: 70732

A negated character class matches everything except those specified characters. Therefore the literal . is causing the problem here. You can modify your regular expression as follows:

>>> import re
>>> s = 'string1; 1.8w/v PEG_8000; string2; ;;'
>>> re.search(';[^;]+PEG[^;]+;', s).group()
'; 1.8w/v PEG_8000;'

Upvotes: 2

Mark Tolonen
Mark Tolonen

Reputation: 177785

A way without re:

>>> s='string1; 1.8w/v PEG_8000; string2; ;;'
>>> ';'+s.split(';')[1]+';'
'; 1.8w/v PEG_8000;'

Upvotes: 1

Related Questions