Reputation: 1056
I am try to scrape a website but that website stores required data in json format. I used spider to get data from a url. I cannot understand the json response as I am a beginner in scrapy especially json format I want to extract dogId and the msgTimeOff. I tried randomly sometimes got key error or the data was not the required one. One block which contains the required data is as below
{"raceId":"1808334","position":"6","trap":"6","resultHandicap":"","name":"Turkey
Blaze","dogSex":"D","dogDateOfBirth":"2017-05-01 05:00","dogSire":"SIDARIAN BLAZE","dogDam":"MISS PRECEDENT","msgTimeOff":"2021-01-05
13:49","status":"6","reservename":"","dogId":"532771","reserveDogId":"","comment":"wide, crowded
run-up and first","withdrawreason":"Wide,CrdRnUp&1","calcRTimeS":29.82,"dogColor":"bk","fract":"10\/1","trainer
":"A Jenkins","favFlag":"","rpDistDesc":"1 1\/4","splitTime":"4.68","winnersTimeS":"29.48","raceStatus":"P","rStatusCde":"P","finalROutcomeId":
"6","reserveYn":"","isNonRunner":"0","isReserved":"0","videoid":""}
it is contained in a list and there are many lists available. I want to extract all of it the code I use to get json response is
class MySpider(scrapy.Spider):
name = "timeline"
def __init__(self,date='', *args,**kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.date = date
self.start_urls = ['https://greyhoundbet.racingpost.com/results/blocks.sd?race_id=1808334&track_id=4&r_date='+ date +'&r_time=13%3A49&blocks=meetingHeader%2Cresults-meeting-pager%2Clist']
def parse2(self, response):
jsn_data = response.json()
for datas in jsn_data['list']['forecasts']:
print(datas)
if __name__ == '__main__'
spider = 'timeline'
date = '2021-01-05'
settings = get_project_settings()
settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
process = CrawlerProcess(settings)
process.crawl(spider, date = date)
process.start()
Upvotes: 0
Views: 140
Reputation: 65
Here is the JSON you're getting with better visual
The json object has several key and values including dogId (Line 14) and msgTimeOff (Line 11). You can take similar approach handling it like a python dictionary
So, in the parse_2 method,
def parse2(self, response): jsn_data = response.json() for datas in jsn_data['list']['forecasts']: print(datas) dogId = datas.get('dogId') #Will return None if key not found msgTimeOff = datas.get('msgTimeOff')
Upvotes: 1
Reputation: 1748
first get raceIds
every raceId has a items , every item has dogId
like this
Upvotes: 1