Glenn G.
Glenn G.

Reputation: 419

Selecting elements for scraping data from two connected drop down menus, <option disabled>

I'm trying to get to the data from this website (https://pigeon-ndb.com/races/). The data is accessed by clicking one option value in one drop-down menu ("Choose an Organization") and then clicking another value from a subsequent drop-down menu ("Choose a Race") that fills with options according to the value clicked in the previous drop-down menu.

The goal is to get to the table of data values after going through the two drop-down menus and scrape them with scrapy.

I've already tried to grab the option values in the first drop-down menu ("Choose an Organization") using this xpath.

response.xpath('//select[@id="organization-selection"]/option/@value').extract()

Output:[u'<option disabled>Loading...</option>']

I expected values from all the options in the drop down menu (more than 1) but only got 1 option value that is not useful.

I'd like to avoid using Selenium to click through the options (too slow). Would appreciate a scrapy solution. Thanks!

Upvotes: 0

Views: 40

Answers (1)

buran
buran

Reputation: 14233

If you check carefully the requests sent you will notice two GET requests being sent among others

https://pigeon-ndb.com/api/?request=get_organizations&database=2019%20OB&_=1556648619801

and

https://pigeon-ndb.com/api/?request=get_races&organization=AMARILLO%20RACING%20PIGEON%20CLUB&orgNum=null&_=1556648619803

they will return the organisations and races as json. it's up to you to construct the second one using every organisation from the first one

EDIT: Note you need to send the database in Cookies header

EDIT2:

import requests

headers={'Cookie': 'database=2019 OB'}
url = 'https://pigeon-ndb.com/api/'
payload = {'request':'get_organizations'}
resp = requests.get(url, params=payload, headers=headers)
for org in resp.json()['data'][:2]: #just first two organizations
    payload = {'request':'get_races', 'organization':org.get('Sys')}
    resp = requests.get(url, params=payload, headers=headers)
    print(resp.json())

this will print the races for first two organizations. Further more you can supply _ as param - that is timestamp from Epoch

also for race details look at

https://pigeon-ndb.com/api/?request=get_race_details&racename=BIG%20SPRING&date=03%2F23%2F2019&time=1556501306

here time is mandatory to supply

Upvotes: 2

Related Questions