Reputation: 2418
I am using scrapy to scrape a website. I want to select all elements with id of the form 'result_%s' where %s is any integer.
sites.select('//*[@id="result_1"]')
How can it be achieved
Upvotes: 4
Views: 7818
Reputation: 296
In Scrapy, the main way to pull information out of a page is with Selector
s. The most popular way to use Scrapy's Selectors is with Xpath expressions.
Xpath has a few handy functions, one of which is contains()
. You can use it in your spider like so:
from scrapy.spider import Spider
from scrapy.selector import Selector
class ExampleSpider(Spider):
name = "exampleSpider"
start_urls = ["http://example.com/sitemap.html"]
def parse(self, response):
sel = Selector(response)
results = sel.xpath("//*[contains(@id, 'result_')]")
for result in results:
#do something with the results here
print result.extract()
It checks to see if the second argument is a substring of the first.
The Official Scrapy Tutorial is a great resource if you want to learn more about structuring your spider and extracting data from a page.
Upvotes: 6