Reputation: 13
i was trying to parse a image link from a website.
When i inspect the link on the website, it is this one :https://static.nike.com/a/images/c_limit,w_592,f_auto/t_product_v1/df7c2668-f714-4ced-9f8f-1f0024f945a9/chaussure-de-basketball-zoom-freak-3-MZpJZF.png but when i parse it with my code the output is data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
.
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').text
soup = BeautifulSoup(source, 'lxml')
pair = soup.find('div', class_='product-card__body')
image_scr = pair.find('img', class_='css-1fxh5tw product-card__hero-image')['src']
print(image_scr)
I think the code isn't the issue but i don't know what's causing the link to come out in base64 format. So how could i set the code to render the link as .png ?
Upvotes: 1
Views: 2555
Reputation: 25196
First at all, take a look into your soup
- There is the truth. Website provides not all information static, there are a lot things provided dynamically and also done by the browser -> So requests
wont get this info this way.
Take a look at the <noscript>
next to your selection, it holds a smaller version of the image and is providing the src
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').content
soup = BeautifulSoup(source, 'lxml')
pair = soup.find('div', class_='product-card__body')
image_scr = pair.select_one('noscript img.css-1fxh5tw.product-card__hero-image')['src']
print(image_scr)
If you like a "big picture" just replace parameter w_318
with w_1000
...
Concerning your comment - There are a lot more solutions, but still depending on what you like to do with the information and what you gonna work with.
Following approache uses selenium
that is unlike requests
rendering the website and give you the "right page source" back but also needs more resources then requests
:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok')
soup=BeautifulSoup(driver.page_source, 'html.parser')
pair = soup.find('div', class_='product-card__body')
image_scr = pair.select_one('img.css-1fxh5tw.product-card__hero-image')['src']
print(image_scr)
Upvotes: 1
Reputation: 16187
As you want to grab src meaning image data, so downloading data from server using requests, you need to use .content
format as follows:
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').content
Upvotes: 1