Reputation: 169
To be clearer : I would like to retrieve in how many episodes of a serie an actor appeared (with the dates) as it's displayed in IMDB.
I'm using the Doctor Who page as an example
In this case, I would like to know that Matt Smith appeared in 46 episodes, from 2010 to 2020.
IMDB does perfectly that on a character object, with currentRole, and it's notes attribute
from imdb import IMDb
ia = IMDb()
movie = ia.get_movie('0436992') # id for Doctor Who
cast = movie['cast']
print("Actor name :", cast[0]['name'])
print("Role :", cast[0].currentRole)
print("Notes :", cast[0].notes)
Displays
Actor name : Matt Smith
Role : The Doctor
Notes : (58 episodes, 2010-2020)
(weirdly, the episode count is wrong, as there is 46 written on the website, and 54 episodes shown if you click on it, but this is not my point)
However, other actors played multiple characters in this serie, Character.currentRole
returns then a list. I changed my code to get it correctly :
from imdb import IMDb
ia = IMDb()
movie = ia.get_movie('0436992')
cast = movie['cast']
for i in range(2):
print("Actor name :", cast[i]['name'])
if isinstance(cast[i].currentRole, list):
print("Roles :")
for role in cast[i].currentRole:
print(" - ", role, " (Note :" + role.notes + ")")
else:
print("Role :", cast[i].currentRole)
print("Notes :", cast[i].notes)
print("")
But the result is :
Actor name : Matt Smith
Role : The Doctor
Notes : (58 episodes, 2010-2020)
Actor name : David Tennant
Roles :
- The Doctor (Note :)
- ... (Note :)
Notes :
I can't retrieve informations I would like here, and all of the "notes" are empty. I tried digging up in the Person and Character objects from imdbpy while debugging, couldn't find what I need.
It seems that it occurs only for actors playing multiple characters, is there a way to retrieve this with imdbpy, and not an external parser ?
Any idea is appreciated
Upvotes: 2
Views: 220
Reputation: 524
I was running into the same problem. Sadly I was not able to solve it with IMDbPY as well. I think it is buggy. Instead I wrote my own parser with bs4:
import requests
from bs4 import BeautifulSoup
# parse the page with bs4
page = requests.get('https://www.imdb.com/title/tt0436992/fullcredits')
soup = BeautifulSoup(page.text, 'lxml')
# find the cast table
table = soup.find('table', {"class": "cast_list"})
cast = []
# iterate over it
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
cast_member = {}
for column in columns:
# name column
if column_marker == 1:
cast_member['name'] = column.get_text().strip()
# combined role and episodes/years column
elif column_marker == 3:
links = column.find_all('a')
role_element = column.find('a', {'class': None})
if role_element:
cast_member['role'] = role_element.get_text().strip()
episodes_and_years_element = column.find('a', {'class': 'toggle-episodes'})
if episodes_and_years_element:
episodes_and_years = episodes_and_years_element.get_text().strip().split(', ')
cast_member['episodes'] = episodes_and_years[0]
if len(episodes_and_years) > 1:
cast_member['years'] = episodes_and_years[1]
column_marker += 1
if len(cast_member):
cast.append(cast_member)
print(cast[:5])
It is definitely not the most elegant solution, but I believe it does what you want.
Upvotes: 1