Scrapy rules for links selection

Question

I am trying to scrape vertically pages that are following a simple rule in the html direction:

They have /MLA#### or /MLA-#### (# as random numbers)

The problem is that with the following code scrapy only detects me as good the pages with /MLA-#### name when a /MLA#### or /####MLA### page appears my scrapy code doesn't work and the scraping is wrong

 rules =  (Rule(LinkExtractor(allow=r'/_Desde_'), follow=True),
        Rule(LinkExtractor(allow='/'+'MLA'), follow=True, callback='parse_items'))

Previously it was as it follows:

 rules =  (Rule(LinkExtractor(allow=r'/_Desde_'), follow=True),
        Rule(LinkExtractor(allow=r'/MLA'), follow=True, callback='parse_items'))

So how can I say to my code: I want to scrapy all the links that contain MLA no matter what is preceeding or following the words.

Thanks for you comments, Regards

Scrapy rules for links selection

Answers (1)

Related Questions