David542
David542

Reputation: 110093

Capture contents in regular expression

I have the following text:

text = itunes20140618.tbz

How would I capture the date here, using a regular expression?

I am currently doing:

date = text.split('.tbz')[0].split('itunes')[-1]

I think using a re.findall here would be cleaner for what I am trying to do. Please note in the regular expression, it needs to be after the specific word "itunes" for the capture group (not just not numbers).

Upvotes: 1

Views: 52

Answers (2)

hwnd
hwnd

Reputation: 70722

You can use re.search to find your desired match.

>>> import re
>>> re.search(r'\d+', 'itunes20140618.tbz').group()
'20140618'

Since you state it has to be after the word itunes, you can use a capturing group and refer to that group number to access your match.

>>> import re
>>> re.search(r'itunes(\d+)', 'itunes20140618.tbz').group(1)
'20140618'

You can also use a Positive Lookbehind to assure it's after the word itunes.

>>> re.search(r'(?<=itunes)\d+', 'itunes20140618.tbz').group()
'20140618'

Upvotes: 2

CMPS
CMPS

Reputation: 7769

Regex:

[^\d]*(\d+).*

Live demo

If you guarantee that the expression is going to be of this form: itunes followed by date, then you can also use this:

itunes(\d+).*

Upvotes: 1

Related Questions