Regex for matching the third,fourth,fifth... word

Question

I have some strings like "aaa bbb ccc", "aaa bbb ccc ddd", "aaa bbb ccc ddd eee"....

I need a regex so that I can't get rid of aaa bbb and get everything else.

I'm trying '\w+\s\w+\s(\w+|\s)+' but it's not working.

In [171]: r = re.search('\w+\s\w+\s(\w+|\s)+', 'aaa bbb ccc ddd')

In [172]: r.group(0)
Out[172]: 'aaa bbb ccc ddd'

In [173]: r.group(1)
Out[173]: 'ddd'

I'd expect it to return ccc ddd

Adam Smith · Accepted Answer

Your method doesn't work because repeating capturing groups replaces the previous capture. If you make that a non-capturing group (including the quantifier) and wrap a capturing group around it, it should work.

re.compile(r"""
    (?:\w+\s){2}        # two words we don't care about
    (                   # begin capturing
      (?:\w+\s?)+       #   1+ word chars followed by an optional space, 1+ times
    )                   # stop capturing""", re.X)

Although I'm not sure why you're using regular expressions for this. Isn't str.split better?

s = 'aaa bbb ccc ddd'
result = s.split()[2:]

Regex for matching the third,fourth,fifth... word

Answers (2)

Related Questions