Fred the Fantastic
Fred the Fantastic

Reputation: 1345

Python regular expression issue

I'm trying to use the re module in a way that it will return bunch of characters until a particular string follows an individual character. The re documentation seems to indicate that I can use (?!...) to accomplish this. The example that I'm currently wrestling with:

str_to_search = 'abababsonab, etc'
first = re.search(r'(ab)+(?!son)', str_to_search)
second = re.search(r'.+(?!son)', str_to_search)

first.group() is 'abab', which is what I'm aiming for. However, second.group() returns the entire str_to_search string, despite the fact that I'm trying to make it stop at 'ababa', as the subsequent 'b' is immediately followed by 'son'. Where am I going wrong?

Upvotes: 0

Views: 76

Answers (3)

user1301404
user1301404

Reputation:

This should work:

second = re.search(r'(.(?!son))+', str_to_search)
#output: 'ababa'

Upvotes: 1

sdanzig
sdanzig

Reputation: 4500

It's not the simplest thing, but you can capture a repeating sequence of "a character not followed by 'son'". This repeated expression should be in a non-capturing group, (?: ... ), so it doesn't mess with your match results. (You'd end up with an extra match group)

Try this:

import re

str_to_search = 'abababsonab, etc'
second = re.search(r'(?:.(?!son))+', str_to_search)
print(second.group())

Output:

ababa

See it here: http://ideone.com/6DhLgN

Upvotes: 2

vish
vish

Reputation: 1056

not sure what you are trying to do

  1. check out string.partition

  2. '.+?' is the minimal matcher, otherwise it is greedy and gets it all

  3. read the docs for group(...) and groups(..) especially when passing group number

Upvotes: 0

Related Questions