user10433947
user10433947

Reputation: 103

Why doesn't python regex search method consistently return the matched object correctly?

I am doing a practice question on a Regex course:

How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

My code is as follows:

regex=re.compile(r'Alice|Bob|Carol\seats|pets|throws\sapples\.|cats\.|baseballs\.',re.IGNORECASE)
mo=regex.search(str)
ma=mo.group()

When I pass str ='BOB EATS CATS.' or 'Alice throws Apples.', mo.group() only returns 'Bob' or 'Alice' respectively, but I was expecting it to return the whole sentence.

When I pass str='Carol throws baseballs.', mo.group() returns 'baseballs.', which is the last match.

I am confused as to why:

Upvotes: 0

Views: 1004

Answers (4)

Сека Наз
Сека Наз

Reputation: 1

Works for all examples in the book

regex = r'(Alice|Bob|Carol)\s(eats|pets|trows)\s(apples|cats|baseballs)', re.I)

Upvotes: 0

Maxim Abazin
Maxim Abazin

Reputation: 1

You can also do it like this:

(\w{3,5}) (\w*) ([^f]\w+)

Upvotes: -1

Mad Physicist
Mad Physicist

Reputation: 114330

You need to tell your regex to group the lists of options somehow, or it will naturally think it's one giant list, with some elements containing spaces. The easiest way is to use capture groups for each word:

regex=re.compile(r'(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.', re.IGNORECASE)

The trailing period shouldn't be part of an option. If you don't want to use capturing groups for some reason (it won't really affect how the match is made), you can use non-capturing groups instead. Replace (...) with (?:...).

Your original regex was interpreted as the following set of options:

  • Alice
  • Bob
  • Carol\seats
  • pets
  • throws\sapples.
  • cats.
  • baseballs.

Spaces don't magically separate options. Hopefully you can see why none of the elements of Carol throws baseballs. besides baseballs. is present in that list. Something like Carol eats baseballs. would match Carol eats though.

Upvotes: 2

SocketPlayer
SocketPlayer

Reputation: 166

you should group all the words

your re should look like:

regex = r'(?:Alice|Bob|Carol)\s(?:eats|pets|throws)\s(?:apples|cats|baseballs)\.'

note that i use (?:) and not (), because the grouping is only for logical purpose

Upvotes: 0

Related Questions