Rorschach
Rorschach

Reputation: 3812

How to use Beautiful Soup's find() instead of find_all() for better runtime

I am writing a webscraper using python's bs4. I am trying to find the first image that has a certain attribute 'data-a-dynamic-image'. Thus far I have the code below, and it works. But, I would prefer to only use find() not find_all. This is because I only care about the first item on the page with that attribute. I don't want to use find_all and waste time sifting through the entire webpage.

def siftImage(soup):
    try:
        for line in soup.find_all('img'):
            if line is not None:
                if line.has_attr('data-a-dynamic-image'):
                    return line['src']

    except:
        return 'No Image '

This second function I made will only return the result that I want, if the first image on the page is the image that I want, otherwise it will return nothing. But, it has the runtime that I am looking for.

def siftImageTwo(soup):
    try:
        line = soup.find('img'):
        if line.has_attr('data-a-dynamic-image'):
            return line['src']

    except:
        return 'No Image '

I am looking for some way to have the functionality of the top script with the timing of the bottom script.

Upvotes: 1

Views: 418

Answers (1)

cbq
cbq

Reputation: 60

According to the official documentation there is a way to search by the custom data-* attributes.
You should try this:

line = soup.find('img', attrs={'data-a-dynamic-image': True})

Upvotes: 3

Related Questions