Reputation: 71
I'm working on a project and I'm trying to extract the pictures' URL from a website. I'm a noob at this so please bear with me. Based on the HTML code, the class of the pictures that I want is "fotorama__img". However, when I execute my code, it doesn't seem to work. Anyone knows why that's the case? Also, how come the src
attribute doesn't contain the whole URL, just a part of it? Example: the link to the image is https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_front.jpg
but the src
attribute of the img
tag is "/files_SYS/images/System/sysThumb/SYS-120U-TNR_main.png"
.
Here is my code:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.supermicro.com/en/products/system/Ultra/1U/SYS-120U-TNR")
soup = BeautifulSoup(page.content,'lxml')
images = soup.find_all("img", {"class": "fotorama__img"})
for image in images:
print(image.get("src"))
And here is the picture of the HTML code for the page
Thank you for your help!
Upvotes: 1
Views: 892
Reputation: 195438
The class is added dynamically via JavaScript, so beautifulsoup doesn't see it. To extract the images from this site, you can do:
import requests
from bs4 import BeautifulSoup
page = requests.get(
"https://www.supermicro.com/en/products/system/Ultra/1U/SYS-120U-TNR"
)
soup = BeautifulSoup(page.content, "lxml")
images = [
"https://www.supermicro.com" + a["href"]
for a in soup.select(".fotorama > a")
]
print(*images, sep="\n")
Prints:
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_main.png
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_angle.jpg
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_top.jpg
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_front.jpg
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_rear.jpg
Upvotes: 1