Vy Do
Vy Do

Reputation: 52576

How to get list of URLs in a specific pattern?

I am new in Scrapy. I crawl this web: https://masothue.vn

I get list of URLs like these

https://masothue.com/4700283907-hop-tac-xa-hop-tac-xa-ha-giang
https://masothue.com/2200794515-cong-ty-tnhh-cuc-tuan
https://masothue.com/0107858721-hop-tac-xa-dich-vu-nong-nghiep-thon-bac-ha
https://masothue.com/1701974147-nguyen-thi-thanh-xuan-my-dat

4700283907, 2200794515, 0107858721, 1701974147 always has 10 character numbers.

regex pattern is https://masothue.com/ + unique 10 numbers string. How to archiev that?

Upvotes: 0

Views: 127

Answers (1)

Lei Yang
Lei Yang

Reputation: 4335

Not clear whether you want match each line or all lines as a whole.
The code is for each line:

import re

s = 'https://masothue.com/4700283907-hop-tac-xa-hop-tac-xa-ha-giang'
print(re.search(r'masothue.com\/(\d{10})', s).group(1))
# 4700283907

Upvotes: 1

Related Questions