Nirmal Patel
Nirmal Patel

Reputation: 61

How to check for tag contain specific attribute or not?

I want to scrape img tags from content.but problem is some of the img containg data-src and some containg src.

i have tried following :

if(content.find('img',{'itemprop':'contentUrl'})['data-src']):

image=content.find('img',{'itemprop':'contentUrl'})['data-src'] 

elif(content.find('img',{'itemprop':'contentUrl'})['src']):

image=content.find('img',{'itemprop':'contentUrl'})['src']

Still it's not working i want to scrape all image url where it contain data-src or src .

Upvotes: 1

Views: 203

Answers (3)

QHarr
QHarr

Reputation: 84465

You can use css selector Or sytax to gather list of either attribute in img tag and then used nested .get

from bs4 import BeautifulSoup as bs

html = '''
<img src="mePlease.gif" alt="Yey" height="42" width="42">
<img data-src="me2.gif" alt="Yey" height="42" width="42">
'''
soup = bs(html, 'lxml')
attrs = [i.get('src', i.get('data-src', None)) for i in soup.select('img[src],img[data-src]')]
print(attrs)

Upvotes: 1

KunduK
KunduK

Reputation: 33384

Try thiswith item.attrs .

for item in content.select('img[itemprop="contentUrl"]'):
    if 'data-src' in item.attrs:
        print(item['data-src'])
    if 'src' in item.attrs:
        print(item['src'])

Upvotes: 1

Wonka
Wonka

Reputation: 1886

Try with lambda, something like this:

img_l = lambda tag: (getattr(tag, "name") == "img" and "src" in tag.attrs)
images = content.find_all(img_l)    

Upvotes: 1

Related Questions