farhan jatt
farhan jatt

Reputation: 1056

scrape data from json response scrapy

I am try to scrape a website but that website stores required data in json format. I used spider to get data from a url. I cannot understand the json response as I am a beginner in scrapy especially json format I want to extract dogId and the msgTimeOff. I tried randomly sometimes got key error or the data was not the required one. One block which contains the required data is as below

{"raceId":"1808334","position":"6","trap":"6","resultHandicap":"","name":"Turkey 
Blaze","dogSex":"D","dogDateOfBirth":"2017-05-01 05:00","dogSire":"SIDARIAN BLAZE","dogDam":"MISS PRECEDENT","msgTimeOff":"2021-01-05 
13:49","status":"6","reservename":"","dogId":"532771","reserveDogId":"","comment":"wide, crowded 
run-up and first","withdrawreason":"Wide,CrdRnUp&1","calcRTimeS":29.82,"dogColor":"bk","fract":"10\/1","trainer
":"A Jenkins","favFlag":"","rpDistDesc":"1 1\/4","splitTime":"4.68","winnersTimeS":"29.48","raceStatus":"P","rStatusCde":"P","finalROutcomeId":
"6","reserveYn":"","isNonRunner":"0","isReserved":"0","videoid":""}

it is contained in a list and there are many lists available. I want to extract all of it the code I use to get json response is

class MySpider(scrapy.Spider):
    name = "timeline"
    def __init__(self,date='', *args,**kwargs):
        super(MySpider, self).__init__(*args, **kwargs)
        self.date = date
        self.start_urls = ['https://greyhoundbet.racingpost.com/results/blocks.sd?race_id=1808334&track_id=4&r_date='+ date +'&r_time=13%3A49&blocks=meetingHeader%2Cresults-meeting-pager%2Clist']
    def parse2(self, response):
        jsn_data = response.json()
        for datas in jsn_data['list']['forecasts']:
            print(datas)
if __name__ == '__main__'
   spider = 'timeline'
   date = '2021-01-05'
   settings = get_project_settings()
   settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
   process = CrawlerProcess(settings)
   process.crawl(spider, date = date)
   process.start()

Upvotes: 0

Views: 140

Answers (2)

Nazmus Sakib
Nazmus Sakib

Reputation: 65

Here is the JSON you're getting with better visual

Here is the JSON you're getting with better visual

The json object has several key and values including dogId (Line 14) and msgTimeOff (Line 11). You can take similar approach handling it like a python dictionary

So, in the parse_2 method,

def parse2(self, response):    
    jsn_data = response.json()
    for datas in jsn_data['list']['forecasts']:
        print(datas)
        dogId = datas.get('dogId') #Will return None if key not found
        msgTimeOff = datas.get('msgTimeOff')

Upvotes: 1

Prince Hamza
Prince Hamza

Reputation: 1748

first get raceIds
every raceId has a items , every item has dogId like thisenter image description here

Upvotes: 1

Related Questions