Jaoa
Jaoa

Reputation: 95

Python BeautifulSoup to Extract Image tags

I am trying to extract image links from imdb webpage.

for example, https://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1 has image element <img alt="Avatar Poster" title="Avatar Poster" src="https://m.media-amazon.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw@@._V1_UX182_CR0,0,182,268_AL_.jpg">

Below is the code I am using and not getting the image url link.

row[17], which is the link I am trying to use, in my code can be found under

https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset
import csv
from bs4 import BeautifulSoup
import urllib2

with open('movie_metadata.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print('Column names are {}'.format(", ".join(row)))
            line_count += 1
        else:
            imdb_link = row[17]
            soup = BeautifulSoup(urllib2.urlopen(imdb_link).read(), features="html.parser")
            link = soup.find(itemprop="img")
            print('\t{} =====> {} =====> {} ====> {}.'.format(row[-1], row[11], row[17], link["src"]))
            line_count += 1

I get TypeError: 'NoneType' object has no attribute '__getitem__' when running the code

Upvotes: 0

Views: 860

Answers (1)

Arn
Arn

Reputation: 2015

Why don't you simplify your code by using requests with Beautiful Soup so that you can debug it in a more feasible way:

import requests
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')
soup.find('img', {'title': 'Avatar Poster'}).get('src')

Upvotes: 3

Related Questions