Reputation: 379
I am trying to scrap the names and addresses of restaurants from the http://www.just-eat.co.uk/belfast-takeaway webpage. So far, my csv output has all the names on one line and all the addresses on one line. I am trying to get one line per name and one line per address.
Below is my spider:
import scrapy
from justeat.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["just-eat.co.uk"]
start_urls = ["http://www.just-eat.co.uk/belfast-takeaway",]
def parse(self, response):
for sel in response.xpath('//*[@id="searchResults"]'):
item = DmozItem()
item['name'] = sel.xpath('//*[@itemprop="name"]').extract()
item['address'] = sel.xpath('//*[@class="address"]').extract()
yield item
and below is my item:
import scrapy
class DmozItem(scrapy.Item):
name = scrapy.Field()
address = scrapy.Field()
I then use
scrapy crawl dmoz -o items.csv
to run my code.
Can anyone put me on the right path with my coding?
Upvotes: 0
Views: 990
Reputation: 2739
Here you go :)
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import scrapy
from justeat.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["just-eat.co.uk"]
start_urls = ["http://www.just-eat.co.uk/belfast-takeaway", ]
def parse(self, response):
for sel in response.xpath('//*[@id="searchResults"]'):
names = sel.xpath('//*[@itemprop="name"]/text()').extract()
names = [name.strip() for name in names]
addresses = sel.xpath('//*[@class="address"]/text()').extract()
addresses = [address.strip() for address in addresses]
result = zip(names, addresses)
for name, address in result:
item = DmozItem()
item['name'] = name
item['address'] = address
yield item
Upvotes: 1