Reputation: 441
I am trying to extract the number from the URL. Here is the code I tried:
urlss = 'http://www.deyi.com/thread-24488-1-1.html'
urlss = re.sub('http://www.deyi.com/thread-(.*?)-1-1.html', '', urlss)
print(urlss)
My expected result is the below number:
24488
How can I achieve this?
Upvotes: 0
Views: 67
Reputation:
You can use Positive Lookahead (?=(\d+))
import re
urlss = 'http://www.deyi.com/thread-24488-1-1.html'
pattern='thread-(?=(\d+))'
match=re.search(pattern,urlss)
print(match.group(1))
output:
24488
If every time url pattern is same only some variable or pages chainging then you can use simple pattern like this:
import re
urlss = 'http://www.deyi.com/thread-24488-1-1.html'
pattern='(\d+){2}'
match=re.search(pattern,urlss)
print(match.group())
output:
24488
Upvotes: 0
Reputation: 48120
re.sub
replaces the content in the string. You need to use re.search
to extract the substring. You can use below regex to extract your desired number from url:
'(?<=thread-)\d+'
This regex will return the string of first continuous series of number just after the "thread-".
For example:
>>> urlss = 'http://www.deyi.com/thread-24488-1-1.html'
>>> import re
>>> re.search('(?<=thread-)\d+', urlss).group()
'24488'
Upvotes: 2