Reputation: 675
I'm using the following set-up to scrape the prices from http://puertorico.craigslist.org/search/sya
def parse(self, response):
items = Selector(response).xpath("//p[@class='row']")
for items in items:
item = StackItem()
item['prices'] = items.xpath("//a[@class='i']/span[@class='price']/text()").extract()
item['url'] = items.xpath("//span[@class='href']/@href").extract()
yield item
My problem is that when I run the script, ALL of the prices are shown for each item.. How can I get only the price of the particular item with its url for each available item?
Upvotes: 0
Views: 159
Reputation: 36282
You can try using a realtive xpath
expression based in the context of the previous one, traversing all nodes until the one you want. Also, use a different variable in the for
loop, like:
import scrapy
from craiglist.items import StackItem
class CraiglistSpider(scrapy.Spider):
name = "craiglist_spider"
allowed_domains = ["puertorico.craigslist.org"]
start_urls = (
'http://puertorico.craigslist.org/search/sya',
)
def parse(self, response):
items = response.selector.xpath("//p[@class='row']")
for i in items:
item = StackItem()
item['prices'] = i.xpath("./a/span[@class='price']/text()").extract()
item['url'] = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract()
yield item
It yields:
{'prices': [u'$130'], 'url': [u'/sys/5105448465.html']}
2015-07-05 23:58:22 [scrapy] DEBUG: Scraped from <200 http://puertorico.craigslist.org/search/sya>
{'prices': [u'$250'], 'url': [u'/sys/5096083890.html']}
2015-07-05 23:58:22 [scrapy] DEBUG: Scraped from <200 http://puertorico.craigslist.org/search/sya>
{'prices': [u'$100'], 'url': [u'/sys/5069848699.html']}
2015-07-05 23:58:22 [scrapy] DEBUG: Scraped from <200 http://puertorico.craigslist.org/search/sya>
{'prices': [u'$35'], 'url': [u'/syd/5007870110.html']}
...
Upvotes: 2