Reputation: 579
I was trying to scrape JSON data using scrapy. I am getting an error while scraping JSON data:
UPDATED:
The First 6 of the values are just working fine. The other values don't print anything. If I use those, the other values also print N/A. The values are present but not returning anything.
The expressions causing the errors are the following:
"Website": value['_source']['AgentMarketingCenter']['0']['Website'],
"Facebook": value['_source']['AgentMarketingCenter']['0']['Facebook_URL'],
"LinkedIn": value['_source']['AgentMarketingCenter']['0']['LinkedIn_URL'],
"Twitter": value['_source']['AgentMarketingCenter']['0']['Twitter'],
"BIO": value['_source']['AgentMarketingCenter']['0']['Bio'],
import scrapy
import json
class MainSpider(scrapy.Spider):
name = 'main'
start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']
def parse(self, response):
resp = json.loads(response.body)
values = resp['hits']['hits']
for value in values:
try:
yield {
'Full Name': value['_source']['fullName'],
'Primary Phonenumber':value['_source']['primaryPhone'],
"Email": value['_source']['primaryEmail'],
"City": value['_source']['agentPrimaryLocation'][0]['city'],
"State": value['_source']['agentPrimaryLocation'][0]['state'],
"Zip": value['_source']['agentPrimaryLocation'][0]['zipcode'],
"Website": value['_source']['AgentMarketingCenter']['0']['Website'],
"Facebook": value['_source']['AgentMarketingCenter']['0']['Facebook_URL'],
"LinkedIn": value['_source']['AgentMarketingCenter']['0']['LinkedIn_URL'],
"Twitter": value['_source']['AgentMarketingCenter']['0']['Twitter'],
"BIO": value['_source']['AgentMarketingCenter']['0']['Bio'],
}
except KeyError:
yield {
'Full Name': 'N/A',
'Primary Phonenumber': 'N/A',
'Email': 'N/A',
'City': 'N/A',
'State': 'N/A',
'Zip': 'N/A',
'Website': 'N/A',
'Facebook': 'N/A',
'LinkedIn': 'N/A',
'Twitter': 'N/A',
'BIO': 'N/A',
}
Upvotes: 0
Views: 333
Reputation: 1933
The information which you want to collect is present not for all dict, so you need to use get
method with a default value to avoid error you get
item = {
'Full Name': value['_source']['fullName'],
'Primary Phonenumber': value['_source']['primaryPhone'],
"Email": value['_source']['primaryEmail'],
"City": value['_source']['agentPrimaryLocation'][0]['city'],
"State": value['_source']['agentPrimaryLocation'][0].get('stateName', 'NA'),
"Zip": value['_source']['agentPrimaryLocation'][0]['zipcode'],
"Website": value['_source']['AgentMarketingCenter'][0].get('Website', 'NA'),
"Facebook": value['_source']['AgentMarketingCenter'][0].get('Facebook_URL', 'NA'),
"LinkedIn": value['_source']['AgentMarketingCenter'][0].get('LinkedIn_URL', 'NA'),
"Twitter": value['_source']['AgentMarketingCenter'][0].get('Twitter', 'NA'),
"BIO": value['_source']['AgentMarketingCenter'][0].get('Bio', 'NA'),
}
Upvotes: 1