Reputation: 95
I am trying to extract image links from imdb webpage.
for example, https://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1
has image element <img alt="Avatar Poster" title="Avatar Poster" src="https://m.media-amazon.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw@@._V1_UX182_CR0,0,182,268_AL_.jpg">
Below is the code I am using and not getting the image url link.
row[17], which is the link I am trying to use, in my code can be found under
https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset
import csv
from bs4 import BeautifulSoup
import urllib2
with open('movie_metadata.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
print('Column names are {}'.format(", ".join(row)))
line_count += 1
else:
imdb_link = row[17]
soup = BeautifulSoup(urllib2.urlopen(imdb_link).read(), features="html.parser")
link = soup.find(itemprop="img")
print('\t{} =====> {} =====> {} ====> {}.'.format(row[-1], row[11], row[17], link["src"]))
line_count += 1
I get TypeError: 'NoneType' object has no attribute '__getitem__'
when running the code
Upvotes: 0
Views: 860
Reputation: 2015
Why don't you simplify your code by using requests with Beautiful Soup so that you can debug it in a more feasible way:
import requests
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')
soup.find('img', {'title': 'Avatar Poster'}).get('src')
Upvotes: 3