user2129794
user2129794

Reputation: 2418

Select all element with particular id patern in python scrapy

I am using scrapy to scrape a website. I want to select all elements with id of the form 'result_%s' where %s is any integer.

sites.select('//*[@id="result_1"]')

How can it be achieved

Upvotes: 4

Views: 7818

Answers (1)

ScrapyNovice
ScrapyNovice

Reputation: 296

In Scrapy, the main way to pull information out of a page is with Selectors. The most popular way to use Scrapy's Selectors is with Xpath expressions.

Xpath has a few handy functions, one of which is contains(). You can use it in your spider like so:

from scrapy.spider import Spider
from scrapy.selector import Selector

class ExampleSpider(Spider):
    name = "exampleSpider"
    start_urls = ["http://example.com/sitemap.html"]

    def parse(self, response):
        sel = Selector(response)
        results = sel.xpath("//*[contains(@id, 'result_')]")
        for result in results:
            #do something with the results here
            print result.extract()

It checks to see if the second argument is a substring of the first.

The Official Scrapy Tutorial is a great resource if you want to learn more about structuring your spider and extracting data from a page.

Upvotes: 6

Related Questions