Reputation: 51
I had made a script for scraping some data from a web site but it only runs for a few few pages and after that it will stop with this message "'NoneType' object has no attribute 'a'".Another error which appear sometimes is this:
File "scrappy3.py", line 31, in <module>
f.writerow(doc_details)
File "C:\python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u015f' in
position 251: character maps to <undefined>
Can You please give me an advice how to resolve those errors.This is my script:
import requests
import csv
from bs4 import BeautifulSoup
import re
import time
start_time = time.time()
page = 1
f = csv.writer(open("./doctors.csv", "w", newline=''))
while page <= 5153:
url = "http://www.sfatulmedicului.ro/medici/n_s0_c0_h_s0_e0_h0_pagina" + str(page)
data = requests.get(url)
print ('scraping page ' + str(page))
soup = BeautifulSoup(data.text,"html.parser")
for liste in soup.find_all('li',{'class':'clearfix'}):
doc_details = []
url_doc = liste.find('a').get('href')
for a in liste.find_all('a'):
if a.has_attr('name'):
doc_details.append(a['name'])
data2 = requests.get(url_doc)
soup = BeautifulSoup(data2.text,"html.parser")
a_tel = soup.find('div',{'class':'contact_doc add_comment'}).a
tel_tag=a_tel['onclick']
tel = tel_tag[tel_tag.find("$(this).html("):tel_tag.find(");")].lstrip("$(this).html(")
doc_details.append(tel)
f.writerow(doc_details)
page += 1
print("--- %s seconds ---" % (time.time() - start_time))
Upvotes: 2
Views: 2029
Reputation: 322
resp_find = soup.find('div',{'class':'contact_doc add_comment'})
if resp_find is not None:
a_tel = resp_find.a
You can query if the response of soup.find() is a NoneType object, if not you can apply the .a
Or you ensure that the soup.find() method never give back a NoneType object, so you have to investigate why this method give a NoneType object
Upvotes: 0
Reputation: 3203
Your error is here
a_tel = soup.find('div',{'class':'contact_doc add_comment'}).a
soup.find
is obviously not finding the div
with the sought class. The return value is None
and this by definition has no attributes.
You should check and decide if to continue
with further queries in the loop or bail out. For example:
div_contact = soup.find('div',{'class':'contact_doc add_comment'})
if div_contact is None:
continue
a_tel = div_contact.a
You could also try with an try .. except
block to cover more cases (like the div
actually not having what you expect)
div_contact = soup.find('div',{'class':'contact_doc add_comment'})
try:
a_tel = div_contact.a
except AttributeError:
continue
which is in theory more Pythonic. Your choice in any case.
Continuous and continued error checking is part of a program.
Upvotes: 3