feners
feners

Reputation: 675

Problems using Scrapy to scrape craigslist.org

I'm using the following set-up to scrape the prices from http://puertorico.craigslist.org/search/sya

def parse(self, response):
        items = Selector(response).xpath("//p[@class='row']")
        for items in items:
            item = StackItem()
            item['prices'] = items.xpath("//a[@class='i']/span[@class='price']/text()").extract()
            item['url'] = items.xpath("//span[@class='href']/@href").extract()


            yield item

My problem is that when I run the script, ALL of the prices are shown for each item.. How can I get only the price of the particular item with its url for each available item?

Upvotes: 0

Views: 159

Answers (1)

Birei
Birei

Reputation: 36282

You can try using a realtive xpath expression based in the context of the previous one, traversing all nodes until the one you want. Also, use a different variable in the for loop, like:

import scrapy
from craiglist.items import StackItem

class CraiglistSpider(scrapy.Spider):
    name = "craiglist_spider"
    allowed_domains = ["puertorico.craigslist.org"]
    start_urls = (
        'http://puertorico.craigslist.org/search/sya',
    )

    def parse(self, response):
        items = response.selector.xpath("//p[@class='row']")
        for i in items:
            item = StackItem()
            item['prices'] = i.xpath("./a/span[@class='price']/text()").extract()
            item['url'] = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract()


            yield item

It yields:

{'prices': [u'$130'], 'url': [u'/sys/5105448465.html']}
2015-07-05 23:58:22 [scrapy] DEBUG: Scraped from <200 http://puertorico.craigslist.org/search/sya>
{'prices': [u'$250'], 'url': [u'/sys/5096083890.html']}
2015-07-05 23:58:22 [scrapy] DEBUG: Scraped from <200 http://puertorico.craigslist.org/search/sya>
{'prices': [u'$100'], 'url': [u'/sys/5069848699.html']}
2015-07-05 23:58:22 [scrapy] DEBUG: Scraped from <200 http://puertorico.craigslist.org/search/sya>
{'prices': [u'$35'], 'url': [u'/syd/5007870110.html']}
...

Upvotes: 2

Related Questions