Reputation: 419
I'm trying to get to the data from this website (https://pigeon-ndb.com/races/). The data is accessed by clicking one option value in one drop-down menu ("Choose an Organization") and then clicking another value from a subsequent drop-down menu ("Choose a Race") that fills with options according to the value clicked in the previous drop-down menu.
The goal is to get to the table of data values after going through the two drop-down menus and scrape them with scrapy.
I've already tried to grab the option values in the first drop-down menu ("Choose an Organization") using this xpath.
response.xpath('//select[@id="organization-selection"]/option/@value').extract()
Output:[u'<option disabled>Loading...</option>']
I expected values from all the options in the drop down menu (more than 1) but only got 1 option value that is not useful.
I'd like to avoid using Selenium to click through the options (too slow). Would appreciate a scrapy solution. Thanks!
Upvotes: 0
Views: 40
Reputation: 14233
If you check carefully the requests sent you will notice two GET requests being sent among others
https://pigeon-ndb.com/api/?request=get_organizations&database=2019%20OB&_=1556648619801
and
they will return the organisations and races as json. it's up to you to construct the second one using every organisation from the first one
EDIT: Note you need to send the database in Cookies header
EDIT2:
import requests
headers={'Cookie': 'database=2019 OB'}
url = 'https://pigeon-ndb.com/api/'
payload = {'request':'get_organizations'}
resp = requests.get(url, params=payload, headers=headers)
for org in resp.json()['data'][:2]: #just first two organizations
payload = {'request':'get_races', 'organization':org.get('Sys')}
resp = requests.get(url, params=payload, headers=headers)
print(resp.json())
this will print the races for first two organizations. Further more you can supply _ as param - that is timestamp from Epoch
also for race details look at
here time is mandatory to supply
Upvotes: 2