akshay_rahar
akshay_rahar

Reputation: 1791

How to get the image src data through beautifulsoup?

I want to get the image src data of all coming soon movies from this link:- Fandango.com

This is the code:-

def poster(genre):
      poster_link = []
      request = requests.get(http://www.fandango.com/moviescomingsoon?GenreFilter=genre)
      content = request.content
      soup = BeautifulSoup(content, "html.parser")
      soup2 = soup.find('div', {'class':'movie-ls-group'})
      elements = soup2.find_all('img')

      for element in elements:
          poster_link.append(element.get('src'))

      return poster_link

When I'm printing the poster_link array then it's giving me None instead of image source.

Upvotes: 1

Views: 2936

Answers (2)

RubyNoob
RubyNoob

Reputation: 547

James's answer is great but I noticed it grabs more than the images for that particular section - it grabs the 'New + Coming Soon' section for the bottom of the page too, which seems to be outside the scope of the genre and appears on other pages. This code restricts the image grab to just the genre-specific coming soon section.

def poster(genre):
    poster_link = []
    request = requests.get('http://www.fandango.com/moviescomingsoon?GenreFilter=' + genre)
    content = request.content
    soup = BeautifulSoup(content, "html.parser")
    comingsoon = soup.find_all('div', {'class':'movie-ls-group'})
    movies = comingsoon[0].find_all('img', {'class':'visual-thumb'})
    for movie in movies:
        poster_link.append(movie.get('data-src'))
    return poster_link

print (poster('Horror'))

You might also want to filter out the 'emptysource.jpg' images in your poster_link array before returning it, as they look like empty placeholders for movies without poster images.

Upvotes: 1

James
James

Reputation: 36608

Try this. It shortcuts the subsetting and grabs all of the images that have the proper class.

def poster(genre):
    poster_link = []
    request = requests.get('http://www.fandango.com/moviescomingsoon?GenreFilter=%s' %genre)
    content = request.content
    soup = BeautifulSoup(content, "html.parser")
    imgs = soup.find_all('img', {'class': 'visual-thumb'})

    for img in imgs:
        poster_link.append(img.get('data-src'))
    return poster_link

Upvotes: 1

Related Questions