Reputation: 3083
I am trying to scrape webpage given in the this link - http://new-york.eat24hours.com/picasso-pizza/19053
Here I am trying to get all the possible details like address and phone etc.. So, Far I have extracted the name, phone, address, reviews, rating. But I also want to extract the the full menu of restaurant here(name of item with price). So, far I have no idea how to manage this data into output of csv.
The rest of the data for a single url will be single but the items in menu will always be of different amount.
here below is my code so far-
import scrapy
from urls import start_urls
class eat24Spider(scrapy.Spider):
AUTOTHROTTLE_ENABLED = True
name = 'eat24'
def start_requests(self):
for x in start_urls:
yield scrapy.Request(x, self.parse)
def parse(self, response):
brickset = response
NAME_SELECTOR = 'normalize-space(.//h1[@id="restaurant_name"]/a/text())'
ADDRESS_SELECTION = 'normalize-space(.//span[@itemprop="streetAddress"]/text())'
LOCALITY = 'normalize-space(.//span[@itemprop="addressLocality"]/text())'
REGION = 'normalize-space(.//span[@itemprop="addressRegion"]/text())'
ZIP = 'normalize-space(.//span[@itemprop="postalCode"]/text())'
PHONE_SELECTOR = 'normalize-space(.//span[@itemprop="telephone"]/text())'
RATING = './/meta[@itemprop="ratingValue"]/@content'
NO_OF_REVIEWS = './/meta[@itemprop="reviewCount"]/@content'
OPENING_HOURS = './/div[@class="hours_info"]//nobr/text()'
EMAIL_SELECTOR = './/div[@class="company-info__block"]/div[@class="business-buttons"]/a[span]/@href[substring-after(.,"mailto:")]'
yield {
'name': brickset.xpath(NAME_SELECTOR).extract_first().encode('utf8'),
'pagelink': response.url,
'address' : str(brickset.xpath(ADDRESS_SELECTION).extract_first().encode('utf8')+', '+brickset.xpath(LOCALITY).extract_first().encode('utf8')+', '+brickset.xpath(REGION).extract_first().encode('utf8')+', '+brickset.xpath(ZIP).extract_first().encode('utf8')),
'phone' : str(brickset.xpath(PHONE_SELECTOR).extract_first()),
'reviews' : str(brickset.xpath(NO_OF_REVIEWS).extract_first()),
'rating' : str(brickset.xpath(RATING).extract_first()),
'opening_hours' : str(brickset.xpath(OPENING_HOURS).extract_first())
}
I am sorry if I am making this confusing but any kind of help will be appreciated. Thank you in advance!!
Upvotes: 0
Views: 448
Reputation: 1549
If you want to extract full restaurant menu, first of all, you need to locate element who contains both name and price:
menu_items = response.xpath('//tr[@itemscope]')
After that, you can simply make for loop and iterate over restaurant items appending name and price to list:
menu = []
for item in menu_items:
menu.append({
'name': item.xpath('.//a[@class="cpa"]/text()').extract_first(),
'price': item.xpath('.//span[@itemprop="price"]/text()').extract_first()
})
Finally you can add new 'menu' key to your dict:
yield {'menu': menu}
Also, I suggest you use scrapy Items for storing scraped data: https://doc.scrapy.org/en/latest/topics/items.html
For outputting data in csv file use scrapy Feed exports, type in console:
scrapy crawl yourspidername -o restaurants.csv
Upvotes: 1