Back to basics: Scrapy

Question

New to scrapy and I definitely need pointers. I've run through some examples and I'm not getting some basics. I'm running scrapy 1.0.3

Spider:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from matrix_scrape.items import MatrixScrapeItem


class MySpider(BaseSpider):
    name = "matrix"
    allowed_domains = ["https://www.kickstarter.com/projects/2061039712/matrix-the-internet-of-things-for-everyonetm"]
    start_urls = ["https://www.kickstarter.com/projects/2061039712/matrix-the-internet-of-things-for-everyonetm"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)

        item = MatrixScrapeItem()
        item['backers'] = hxs.select("//*[@id="backers_count"]/data").extract()
        item['totalPledged'] = hxs.select("//*[@id="pledged"]/data").extract()
        print backers, totalPledged

item:

import scrapy


class MatrixScrapeItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    backers = scrapy.Field()
    totalPledged = scrapy.Field()

    pass

I'm getting the error:

File "/home/will/Desktop/repos/scrapy/matrix_scrape/matrix_scrape/spiders/test.py", line 15
    item['backers'] = hxs.select("//*[@id="backers_count"]/data").extract()

Myquestions are: Why isn't the selecting and extracting working properly? I do see people just using Selector a lot instead of HtmlXPathSelector.

Also I'm trying to save this to a csv file and automate it based on time (extract these data points every 30 min). If anyone has any pointers for examples of that, they'd get super brownie points :)

Back to basics: Scrapy

Answers (1)

Related Questions