Reputation: 13
I'm working with BeautifulSoup to scrape an imdb webpage (https://www.imdb.com/search/title/?release_date=2017&sort=num_votes,desc&page=1). I've successfully scraped the name, year, intro, votes, director, etc. but having difficulties scraping "gross" and "actors".
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="591671">591,671</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span name="nv" data-value="226,277,068">$226.28M</span>
</p>
<p class="">
Director:
<a href="/name/nm0003506/?ref_=adv_li_dr_0">James Mangold</a>
<span class="ghost">|</span>
Stars:
<a href="/name/nm0413168/?ref_=adv_li_st_0">Hugh Jackman</a>,
<a href="/name/nm0001772/?ref_=adv_li_st_1">Patrick Stewart</a>,
<a href="/name/nm6748436/?ref_=adv_li_st_2">Dafne Keen</a>,
<a href="/name/nm2933542/?ref_=adv_li_st_3">Boyd Holbrook</a>
</p>
Below are the code I used:
import requests
from bs4 import BeautifulSoup
directors=[]
actors=[]
votes=[]
grosses=[]
res_movie = requests.get('http://www.imdb.com/search/titlerelease_date='+'2018'+'&sort=num_votes,desc&page='+'1')
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')
for movie in movies:
director=movie.find('p',class_='').find_all('a')[0].text
directors.append(director)
actors.append(movie.find('p',class_='').find_all('a')[1:].text)
vote=movie.find_all('span', attrs = {'name':'nv'})[0].text
votes.append(vote)
gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
grosses.append(gross)
The error I'm getting from actors:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-70-a969b9a65fa7> in <module>
60 directors.append(director)
61
---> 62 actors.append(movie.find('p',class_='').find_all('a')[:1].text)
63
64
AttributeError: 'list' object has no attribute 'text'
The error I'm getting from gross:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-69-bd813766e1ca> in <module>
74 votes.append(vote)
75
---> 76 gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
77 grosses.append(gross)
78 # print(directors)
IndexError: list index out of range
I was hoping to use the list's index to get the element I desired. I would love to learn the proper method to obtain the element. Thanks so much in advance!!
Upvotes: 1
Views: 1156
Reputation: 195458
find_all()
returns list of found elements, so you need to iterate this list to get text of each element
For some movies, the Gross revenue doesn't exist, so we need to check for existence first.
Fixed version:
import requests
from bs4 import BeautifulSoup
directors=[]
actors=[]
votes=[]
grosses=[]
url = 'https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1'
res_movie = requests.get(url)
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')
for movie in movies:
director=movie.find('p',class_='').find_all('a')[0].text
directors.append(director)
actors.append([a.text for a in movie.find('p',class_='').find_all('a')[1:]]) # <-- using list comprehension
nv = movie.find_all('span', attrs = {'name':'nv'})
vote=nv[0].text
votes.append(vote)
gross= nv[1].text if len(nv) > 1 else '-' # <-- check if Gross revenue exists for the movie
grosses.append(gross)
# print the values:
for d, a, v, g in zip(directors, actors, votes, grosses):
print('{:<22} {!s:<120} {:<12} {}'.format(d, a, v, g))
Prints:
Anthony Russo ['Joe Russo', 'Robert Downey Jr.', 'Chris Hemsworth', 'Mark Ruffalo', 'Chris Evans'] 734,642 $678.82M
Ryan Coogler ['Chadwick Boseman', 'Michael B. Jordan', "Lupita Nyong'o", 'Danai Gurira'] 557,058 $700.06M
David Leitch ['Ryan Reynolds', 'Josh Brolin', 'Morena Baccarin', 'Julian Dennison'] 429,727 $324.59M
Bryan Singer ['Rami Malek', 'Lucy Boynton', 'Gwilym Lee', 'Ben Hardy'] 398,775 $216.43M
John Krasinski ['Emily Blunt', 'John Krasinski', 'Millicent Simmonds', 'Noah Jupe'] 339,291 $188.02M
Steven Spielberg ['Tye Sheridan', 'Olivia Cooke', 'Ben Mendelsohn', 'Lena Waithe'] 324,204 $137.69M
James Wan ['Jason Momoa', 'Amber Heard', 'Willem Dafoe', 'Patrick Wilson'] 317,403 $335.06M
Ruben Fleischer ['Tom Hardy', 'Michelle Williams', 'Riz Ahmed', 'Scott Haze'] 316,446 $213.52M
...and so on.
Upvotes: 2