Jeroen Gerits
Jeroen Gerits

Reputation: 145

Python regex extract vimeo id from url

embed_url = 'http://www.vimeo.com/52422837'
response = re.search(r'^(http://)?(www\.)?(vimeo\.com/)?([\/\d+])', embed_url)
return response.group(4)

The response is:

5

I was hoping for

52422837

Anybody an idea? I'm really bad with regexes :S

Upvotes: 7

Views: 4168

Answers (4)

Steve Chambers
Steve Chambers

Reputation: 39424

To get everything after the last slash (assuming there is one) the following simple regex should do it:

[^/]*$

(Greedily grabs everything up to the end that isn't a slash.)

Upvotes: 1

Colonel Panic
Colonel Panic

Reputation: 137584

Don't reinvent the wheel!

>>> import urlparse
>>> urlparse.urlparse('http://www.vimeo.com/52422837')
ParseResult(scheme='http', netloc='www.vimeo.com', path='/52422837', params='',
query='', fragment='')

>>> urlparse.urlparse('http://www.vimeo.com/52422837').path.lstrip("/")
'52422837'

Upvotes: 10

Yann
Yann

Reputation: 1

Have you tried finishing your regexp with a dollar ($) symbol?

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1122092

Use \d+ (no brackets) to match the literal slash + digits:

response = re.search(r'^(http://)?(www\.)?(vimeo\.com/)?(\d+)', embed_url)

Result:

>>> re.search(r'^(http://)?(www\.)?(vimeo\.com/)?(\d+)', embed_url).group(4)
'52422837'

You were using a character group ([...]) where none was needed. The pattern [\/\d+] matches exactly one of /, + or a digit.

Upvotes: 5

Related Questions