Muhammad Faisal
Muhammad Faisal

Reputation: 131

How do I scrape image-src in beautifulsoup

I am trying to get image-src in this code:

<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>

I tried this code but it is not working:

images = soup.find('img').get('image-src')

Usually I use get('src') and it works but the problem is here: I need to use image-src which does not work.

Upvotes: 6

Views: 14268

Answers (4)

Selena Garcia Lobo
Selena Garcia Lobo

Reputation: 11

if you wanna take the src you can do this...

new_var = soup.find(attrs={"attribute" : "name_attr"})
imageItem= new_var.get('src')

Upvotes: 1

QHarr
QHarr

Reputation: 84465

You can use an css id selector if the id is static to select the element then subset to get the img-src attribute

from bs4 import BeautifulSoup as bs

html = '''
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
'''
soup = bs(html, 'lxml')
print(soup.select_one('#limage_32856997152')['image-src'])

If id is not static, and if there can be more than one to target you might want to use a class selector combined with attribute

srcs = [ img['image-src'] for img in soup.select('.picCore[image-src]')]
print(srcs)

Any image-src, just use an attribute selector

srcs = [img['image-src'] for img in soup.select('[image-src]')]

Upvotes: 1

Salvatore
Salvatore

Reputation: 11952

Looking at this documentation, I found the find_all method which works for this case:

This worked for me:

for link in soup.find_all('img'):
    print(link.get('image-src'))

Here was my full code:

from bs4 import BeautifulSoup

html_doc = """
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

for link in soup.find_all('img'):
    print(link.get('image-src'))

and the result:

//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg  

Upvotes: 5

Bitto
Bitto

Reputation: 8215

You can access a tag’s attributes by treating the tag like a dictionary. You can access that dictionary directly as .attrs

soup.find('img').attrs['image-src']

Upvotes: 1

Related Questions