Capture contents in regular expression

Question

I have the following text:

text = itunes20140618.tbz

How would I capture the date here, using a regular expression?

I am currently doing:

date = text.split('.tbz')[0].split('itunes')[-1]

I think using a re.findall here would be cleaner for what I am trying to do. Please note in the regular expression, it needs to be after the specific word "itunes" for the capture group (not just not numbers).

hwnd · Accepted Answer

You can use re.search to find your desired match.

>>> import re
>>> re.search(r'\d+', 'itunes20140618.tbz').group()
'20140618'

Since you state it has to be after the word itunes, you can use a capturing group and refer to that group number to access your match.

>>> import re
>>> re.search(r'itunes(\d+)', 'itunes20140618.tbz').group(1)
'20140618'

You can also use a Positive Lookbehind to assure it's after the word itunes.

>>> re.search(r'(?<=itunes)\d+', 'itunes20140618.tbz').group()
'20140618'

Capture contents in regular expression

Answers (2)

Related Questions