raghu
raghu

Reputation: 131

python reg-ex pattern not matching

I have a reg-ex matching problem with the following pattern and the string. Pattern is basically a name followed by any number of characters followed by one of the phrases(see pattern below) follwed by any number of characters followed by institution name.

pattern = "[David Maxwell|David|Maxwell] .* [educated at|graduated from|attended|studied at|graduate of] .* Eton College"
str = "David Maxwell was educated at Eton College, where he was a King's Scholar and Captain of Boats, and at Cambridge University where he rowed in the winning Cambridge boat in the 1971 and 1972 Boat Races."
match = re.search(pattern, str)

But the search method returns a no match for the above str? Is my reg-ex proper? I'm new to reg-ex. Any help is appreciated

Upvotes: 0

Views: 70

Answers (1)

Bryan Oakley
Bryan Oakley

Reputation: 385980

[...] means "any character from this set of characters". If you want "any word in this group of words" you need to use parenthesis: (...|...).

There's another problem in your expression, where you have .* (space, dot, star, space), which means "a space, followed by zero or more characters, followed by a space". In other words, the shortest possible match is two spaces. However, your text only has one space between "educated at" and "Eton College".

>>> pattern = '(David Maxwell|David|Maxwell).*(educated at|graduated from|attended|studied at|graduate of).*Eton College'
>>> str = "David Maxwell was educated at Eton College, where he was a King's Scholar and Captain of Boats, and at Cambridge University where he rowed in the winning Cambridge boat in the 1971 and 1972 Boat Races."
>>> re.search(pattern, str)
<_sre.SRE_Match object at 0x1006d10b8>

Upvotes: 5

Related Questions