William Gode
William Gode

Reputation: 13

Why is my parsed image link comming out in base64 format

i was trying to parse a image link from a website. When i inspect the link on the website, it is this one :https://static.nike.com/a/images/c_limit,w_592,f_auto/t_product_v1/df7c2668-f714-4ced-9f8f-1f0024f945a9/chaussure-de-basketball-zoom-freak-3-MZpJZF.png but when i parse it with my code the output is .

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').text

soup = BeautifulSoup(source, 'lxml')

pair = soup.find('div', class_='product-card__body')

image_scr = pair.find('img', class_='css-1fxh5tw product-card__hero-image')['src']
print(image_scr)

I think the code isn't the issue but i don't know what's causing the link to come out in base64 format. So how could i set the code to render the link as .png ?

Upvotes: 1

Views: 2555

Answers (2)

HedgeHog
HedgeHog

Reputation: 25196

What happens?

First at all, take a look into your soup - There is the truth. Website provides not all information static, there are a lot things provided dynamically and also done by the browser -> So requests wont get this info this way.

Workaround

Take a look at the <noscript> next to your selection, it holds a smaller version of the image and is providing the src

Example

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').content

soup = BeautifulSoup(source, 'lxml')

pair = soup.find('div', class_='product-card__body')

image_scr = pair.select_one('noscript img.css-1fxh5tw.product-card__hero-image')['src']
print(image_scr)

Output

https://static.nike.com/a/images/c_limit,w_318,f_auto/t_product_v1/df7c2668-f714-4ced-9f8f-1f0024f945a9/chaussure-de-basketball-zoom-freak-3-MZpJZF.png

If you like a "big picture" just replace parameter w_318 with w_1000...

Edit

Concerning your comment - There are a lot more solutions, but still depending on what you like to do with the information and what you gonna work with.

Following approache uses selenium that is unlike requests rendering the website and give you the "right page source" back but also needs more resources then requests:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok')

soup=BeautifulSoup(driver.page_source, 'html.parser')

pair = soup.find('div', class_='product-card__body')

image_scr = pair.select_one('img.css-1fxh5tw.product-card__hero-image')['src']
print(image_scr)

Output

https://static.nike.com/a/images/c_limit,w_592,f_auto/t_product_v1/df7c2668-f714-4ced-9f8f-1f0024f945a9/chaussure-de-basketball-zoom-freak-3-MZpJZF.png

Upvotes: 1

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

As you want to grab src meaning image data, so downloading data from server using requests, you need to use .content format as follows:

source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').content

Upvotes: 1

Related Questions