Reputation: 876
New to scrapy and I definitely need pointers. I've run through some examples and I'm not getting some basics. I'm running scrapy 1.0.3
Spider:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from matrix_scrape.items import MatrixScrapeItem
class MySpider(BaseSpider):
name = "matrix"
allowed_domains = ["https://www.kickstarter.com/projects/2061039712/matrix-the-internet-of-things-for-everyonetm"]
start_urls = ["https://www.kickstarter.com/projects/2061039712/matrix-the-internet-of-things-for-everyonetm"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
item = MatrixScrapeItem()
item['backers'] = hxs.select("//*[@id="backers_count"]/data").extract()
item['totalPledged'] = hxs.select("//*[@id="pledged"]/data").extract()
print backers, totalPledged
item:
import scrapy
class MatrixScrapeItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
backers = scrapy.Field()
totalPledged = scrapy.Field()
pass
I'm getting the error:
File "/home/will/Desktop/repos/scrapy/matrix_scrape/matrix_scrape/spiders/test.py", line 15
item['backers'] = hxs.select("//*[@id="backers_count"]/data").extract()
Myquestions are: Why isn't the selecting and extracting working properly? I do see people just using Selector a lot instead of HtmlXPathSelector.
Also I'm trying to save this to a csv file and automate it based on time (extract these data points every 30 min). If anyone has any pointers for examples of that, they'd get super brownie points :)
Upvotes: 3
Views: 85
Reputation: 474201
The syntax error is caused by the way you use double quotes. Mix single and double quotes:
item['backers'] = hxs.select('//*[@id="backers_count"]/data').extract()
item['totalPledged'] = hxs.select('//*[@id="pledged"]/data').extract()
As a side note, you can use response.xpath()
shortcut instead of instantiating HtmlXPathSelector
:
def parse(self, response):
item = MatrixScrapeItem()
item['backers'] = response.xpath('//*[@id="backers_count"]/data').extract()
item['totalPledged'] = response.xpath('//*[@id="pledged"]/data').extract()
print backers, totalPledged
And you've probably meant to get the text()
of the data
elements:
//*[@id="backers_count"]/data/text()
//*[@id="pledged"]/data/text()
Upvotes: 2