LaurieFalcon
LaurieFalcon

Reputation: 103

Trouble scraping with BeautifulSoup

I'm trying to do some scraping and I'm stuck on a basic problem (I guess ?)

Here's my script so far :

from requests import get
from bs4 import BeautifulSoup

url = 'http://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1'

response = get(url)

soup = BeautifulSoup(response.text, 'html.parser')


movies_containers = soup.find_all('div', class_ = 'lister-item mode-advanced')

names = []
years = []
imdb_ratings = []
metascores = []
votes = []
#gross=[] #many movies have no record
movie_description=[]
movie_duration=[]
movie_genre=[]


for container in movies_containers:
    if container.find_all('div', class_ = 'ratings-metascore') is not None:

        name = container.find('h3', class_ = 'lister-item-header').a.text
        names.append(name)

        year = container.h3.find('span', class_ = 'lister-item-year text-muted unbold').text
        year = year.replace('(', ' ')
        year = year.replace(')', ' ')
        years.append(year)

        imdb_rating = float(container.find('div', class_ = 'inline-block ratings-imdb-rating').text)
        imdb_ratings.append(imdb_rating)

        score = container.find('span', class_ = 'metascore').text
        metascores.append(score)

And I got this error :

AttributeError: 'NoneType' object has no attribute 'text'

I don't understand why this line of code doesn't work.

When I remove .text :

score = container.find('span', class_ = 'metascore')

It give me this :

<span class="metascore favorable">77        </span>

Any ideas ?

Thanks

Upvotes: 0

Views: 66

Answers (1)

baduker
baduker

Reputation: 20022

Some of the score tags are actually None hence the error. Try this:

import requests
from bs4 import BeautifulSoup

url = 'http://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')
movies_containers = soup.find_all('div', class_='lister-item mode-advanced')

names = []
years = []
imdb_ratings = []
metascores = []
votes = []
movie_description = []
movie_duration = []
movie_genre = []

for container in movies_containers:
    if container.find_all('div', class_='ratings-metascore') is not None:
        name = container.find('h3', class_='lister-item-header').a.text
        names.append(name)

        year = container.h3.find('span', class_='lister-item-year text-muted unbold').text
        years.append(year.replace('(', ' ').replace(')', ' '))

        imdb_rating = float(container.find('div', class_='inline-block ratings-imdb-rating').text)
        imdb_ratings.append(imdb_rating)

        score = container.find('span', class_='metascore')
        if score:
            metascores.append(score.getText(strip=True))
print(metascores)

Output:

['77', '74', '67', '84', '94', '76', '73', '85', '69', '81', '86', '88', '45', '81', '87', '75', '58', '65', '44', '62', '39', '65', '94', '48', '82', '52', '54', '93', '56', '73', '52', '41', '75', '47', '77', '63', '34', '75', '29', '51', '37', '65']

Upvotes: 2

Related Questions