codewithawais
codewithawais

Reputation: 579

TypeError: list indices must be integers or slices, not str Error while scraping JSON data

I was trying to scrape JSON data using scrapy. I am getting an error while scraping JSON data:

UPDATED:

The First 6 of the values are just working fine. The other values don't print anything. If I use those, the other values also print N/A. The values are present but not returning anything.

The expressions causing the errors are the following:

"Website": value['_source']['AgentMarketingCenter']['0']['Website'],

"Facebook": value['_source']['AgentMarketingCenter']['0']['Facebook_URL'],

"LinkedIn": value['_source']['AgentMarketingCenter']['0']['LinkedIn_URL'],

"Twitter": value['_source']['AgentMarketingCenter']['0']['Twitter'],

"BIO": value['_source']['AgentMarketingCenter']['0']['Bio'],

import scrapy
import json

class MainSpider(scrapy.Spider):
    name = 'main'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

    def parse(self, response):
        resp = json.loads(response.body)
        values = resp['hits']['hits']

        for value in values:

            try: 
                yield {
                    'Full Name': value['_source']['fullName'],
                    'Primary Phonenumber':value['_source']['primaryPhone'],
                    "Email": value['_source']['primaryEmail'],
                    "City": value['_source']['agentPrimaryLocation'][0]['city'],
                    "State": value['_source']['agentPrimaryLocation'][0]['state'],
                    "Zip": value['_source']['agentPrimaryLocation'][0]['zipcode'],
                    "Website": value['_source']['AgentMarketingCenter']['0']['Website'],
                    "Facebook": value['_source']['AgentMarketingCenter']['0']['Facebook_URL'],
                    "LinkedIn": value['_source']['AgentMarketingCenter']['0']['LinkedIn_URL'],
                    "Twitter": value['_source']['AgentMarketingCenter']['0']['Twitter'],
                    "BIO": value['_source']['AgentMarketingCenter']['0']['Bio'],
                }

            except KeyError:
                yield { 
                    'Full Name': 'N/A',
                    'Primary Phonenumber': 'N/A',
                    'Email': 'N/A',
                    'City': 'N/A',
                    'State': 'N/A',
                    'Zip': 'N/A',
                    'Website': 'N/A',
                    'Facebook': 'N/A',
                    'LinkedIn': 'N/A',
                    'Twitter': 'N/A',
                    'BIO': 'N/A',
                }

Upvotes: 0

Views: 333

Answers (1)

Roman
Roman

Reputation: 1933

The information which you want to collect is present not for all dict, so you need to use get method with a default value to avoid error you get

item = {
            'Full Name': value['_source']['fullName'],
            'Primary Phonenumber': value['_source']['primaryPhone'],
            "Email": value['_source']['primaryEmail'],
            "City": value['_source']['agentPrimaryLocation'][0]['city'],
            "State": value['_source']['agentPrimaryLocation'][0].get('stateName', 'NA'),
            "Zip": value['_source']['agentPrimaryLocation'][0]['zipcode'],
            "Website": value['_source']['AgentMarketingCenter'][0].get('Website', 'NA'),
            "Facebook": value['_source']['AgentMarketingCenter'][0].get('Facebook_URL', 'NA'),
            "LinkedIn": value['_source']['AgentMarketingCenter'][0].get('LinkedIn_URL', 'NA'),
            "Twitter": value['_source']['AgentMarketingCenter'][0].get('Twitter', 'NA'),
            "BIO": value['_source']['AgentMarketingCenter'][0].get('Bio', 'NA'),
        }

Upvotes: 1

Related Questions