Reputation: 52576
I am new in Scrapy. I crawl this web: https://masothue.vn
I get list of URLs like these
https://masothue.com/4700283907-hop-tac-xa-hop-tac-xa-ha-giang
https://masothue.com/2200794515-cong-ty-tnhh-cuc-tuan
https://masothue.com/0107858721-hop-tac-xa-dich-vu-nong-nghiep-thon-bac-ha
https://masothue.com/1701974147-nguyen-thi-thanh-xuan-my-dat
4700283907
, 2200794515
, 0107858721
, 1701974147
always has 10 character numbers.
regex pattern is https://masothue.com/
+ unique 10 numbers string. How to archiev that?
Upvotes: 0
Views: 127
Reputation: 4335
Not clear whether you want match each line or all lines as a whole.
The code is for each line:
import re
s = 'https://masothue.com/4700283907-hop-tac-xa-hop-tac-xa-ha-giang'
print(re.search(r'masothue.com\/(\d{10})', s).group(1))
# 4700283907
Upvotes: 1