Reputation: 41
I've been having a difficult time with this one.
I want to scrape all the prices listed for this Bruno Mars concert at the Hollywood Bowl so I can get the average price.
http://www.stubhub.com/bruno-mars-tickets/bruno-mars-hollywood-hollywood-bowl-31-5-2014-4449604/
I've located the prices in the HTML and the xpath is pretty straightforward but I cannot get any values to return.
I think it has something to do with the content being generated via javascript or ajax but I can't figure out how to send the correct request to get the code to work.
Here's what I have:
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from deeptix.items import DeeptixItem
class TicketSpider(BaseSpider):
name = "deeptix"
allowed_domains = ["stubhub.com"]
start_urls = ["http://www.stubhub.com/bruno-mars-tickets/bruno-mars-hollywood-hollywood-bowl-31-5-2014-4449604/"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//div[contains(@class, "q_cont")]')
items = []
for site in sites:
item = DeeptixItem()
item['price'] = site.xpath('span[contains(@class, "q")]/text()').extract()
items.append(item)
return items
Any help would be greatly appreciated I've been struggling with this one for quite some time now. Thank you in advance!
Upvotes: 2
Views: 4416
Reputation: 474161
According to the chrome network console, there is a AJAX request that loads all of the info about the event, including ticket information.
You don't need scrapy at all, just urllib2
to get the data and json
module to extract ticket prices:
import json
from pprint import pprint
import urllib2
url = 'http://www.stubhub.com/ticketAPI/restSvc/event/4449604'
data = json.load(urllib2.urlopen(url))
tickets = data['eventTicketListing']['eventTicket']
prices = [ticket['tc']['amount'] for ticket in tickets]
pprint(sorted(prices))
prints:
[156.0,
159.0,
169.0,
175.0,
175.0,
194.5,
199.0,
...
]
Hope that helps.
Upvotes: 1