Regex to extract specific number from the URL based on the URL pattern

Question

I am trying to extract the number from the URL. Here is the code I tried:

urlss = 'http://www.deyi.com/thread-24488-1-1.html'
urlss = re.sub('http://www.deyi.com/thread-(.*?)-1-1.html', '', urlss)
print(urlss)

My expected result is the below number:

How can I achieve this?

Moinuddin Quadri · Accepted Answer

re.sub replaces the content in the string. You need to use re.search to extract the substring. You can use below regex to extract your desired number from url:

'(?<=thread-)\d+'

This regex will return the string of first continuous series of number just after the "thread-".

For example:

>>> urlss = 'http://www.deyi.com/thread-24488-1-1.html'
>>> import re

>>> re.search('(?<=thread-)\d+', urlss).group()
'24488'

Answers (2)