Reputation: 1482
I'm trying to scrape a local folder of HTML files for a couple of variables but I'm getting an exception about halfway through the loop. The exception is AttributeError: 'NoneType' object has no attribute 'contents
. It is not actually .contents
I've looked at the file it gets hung up on and it's structured exactly the same as the other files. If you remove .contents
then you just raise the same exception but with the find()
function. Anyone know why this is happening? Again many of the files process without a problem. My code is below:
df_list = []
folder = 'rt_html'
for movie_html in os.listdir(folder):
with open(os.path.join(folder, movie_html)) as file:
soup = BeautifulSoup(file)
title = soup.find('title').contents[0][:-len(' - Rotten Tomatoes')]
audience_score = soup.find('div', class_ = 'audience-score meter').find('span').contents[0][:-1]
num_audience_ratings = soup.find('div', class_ = 'audience-info hidden-xs superPageFontColor')
num_audience_ratings = num_audience_ratings.find_all('div') [1].contents[2].strip().replace(',', '')
# print(num_audience_ratings)
# break
df_list.append({'title': title,
'audience_score': int(audience_score),
'number_of_audience_ratings': int(num_audience_ratings)})
df = pd.DataFrame(df_list, columns = ['title', 'audience_score', 'number_of_audience_ratings'])
Upvotes: 2
Views: 163
Reputation: 8225
My guess is that some of the files do not have the attributes you are looking for.
Eg.
audience_score = soup.find('div', class_ = 'audience-score meter').find('span').contents[0][:-1]
If there is no div
with the class audience-score meter
then soup.find('div', class_ = 'audience-score meter')
will return None
. Any subsequent find
or contents
on this will result in an AttributeError
A solution would be to try-except this and set the value to empty string.
try:
audience_score = soup.find('div', class_ = 'audience-score meter').find('span').contents[0][:-1]
except AttributeError:
audience_score=""
Do the same for title
and num_audience_ratings
(both assignments)
Upvotes: 3