Reputation: 3812
I am writing a webscraper using python's bs4.
I am trying to find the first image that has a certain attribute 'data-a-dynamic-image'. Thus far I have the code below, and it works. But, I would prefer to only use find()
not find_all
. This is because I only care about the first item on the page with that attribute. I don't want to use find_all and waste time sifting through the entire webpage.
def siftImage(soup):
try:
for line in soup.find_all('img'):
if line is not None:
if line.has_attr('data-a-dynamic-image'):
return line['src']
except:
return 'No Image '
This second function I made will only return the result that I want, if the first image on the page is the image that I want, otherwise it will return nothing. But, it has the runtime that I am looking for.
def siftImageTwo(soup):
try:
line = soup.find('img'):
if line.has_attr('data-a-dynamic-image'):
return line['src']
except:
return 'No Image '
I am looking for some way to have the functionality of the top script with the timing of the bottom script.
Upvotes: 1
Views: 418
Reputation: 60
According to the official documentation there is a way to search by the custom data-* attributes.
You should try this:
line = soup.find('img', attrs={'data-a-dynamic-image': True})
Upvotes: 3