KWANG
KWANG

Reputation: 23

Unable to grab the 'src' tag for an image with Beautiful Soup

I'm currently working on a web scraper to download information off my school's newspaper's website to re-upload to our new upcoming website. Right now I'm currently testing how to download the images from the web page with bs4. However, as explained in my code below I'm unable to find the 'src' tag for the image aka the url in order to download the image.

import requests, bs4

url = 'https://www.behrendbeacon.com/parkingconcernsaddressed'
res = requests.get(url)
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text)
imgElems = soup.select('img')
print(imgElem[2]) 
# prints <img alt="18160.jpeg" data-type="image" id="comp-jpa6qz48imgimage"/>

So for further explanation:

1.) If you go to the url and inspect the web page with the developers tools you will understand that imgElem[2] is the main image in the news article I'm trying to grab. Here's an image below to illustrate what I mean:

Here's the web page screenshot

2.) And the reason I'm print imgElem[2] is to demonstrate that Beautiful Soup doesn't grab the 'src' tag with the rest of the data

In short, can someone potentially explain what I'm missing out on? Could this inability to grab the 'src' tag lie in the fact that the website is a Wix site? Thank you for any help you can give

Upvotes: 2

Views: 526

Answers (1)

chitown88
chitown88

Reputation: 28565

might just be a case that the page needs to render first because it's dynamic. I believe the package requests-html link here can do that (although there seems to be a bug with it if you're trying to use it with Spyder. So I'm not too familiar with it.) At some point, I will have to learn/play around with it.

In the mean time, I've used Selenium to work with dynamic pages. Selenium worked for me on this one:

import bs4 
from selenium import webdriver 

url = 'https://www.behrendbeacon.com/parkingconcernsaddressed'

browser = webdriver.Chrome()
browser.get(url)

res = browser.page_source

soup = bs4.BeautifulSoup(res, 'html.parser')
imgElems = soup.find('img').get('src')

# print (imgElems) 
# prints https://static.wixstatic.com/media/7384a7_7bb56fcbcb6c48c0875c93a2b6c9821c~mv2.jpg/v1/fill/
#        w_820,h_151,al_c,q_80,usm_0.66_1.00_0.01/7384a7_7bb56fcbcb6c48c0875c93a2b6c9821c~mv2.webp

browser.close()

Upvotes: 3

Related Questions