modLmakur
modLmakur

Reputation: 573

find href values filtering by class with beautiful soup

I have page source that has “a class” links like the example below. I would like to return a list containing all the “href” values, so in the example below "/detail/Request-Technology%2C-LLC-Oakland-CA-94609/napil006/cyberMan”. I’m using beautiful soup with find_all trying to use the ‘a class’ attribute, but it doesn’t return anything. Can anyone see what I’m doing wrong and suggest a solution?

source:

<a class="web-btn-link easy-click" href="/detail/Request-Technology%2C-LLC-Oakland-CA-94609/napil006/cyberMan" id="position15" onclick="cookieJobID('b54b4b964def18552eefff31d034d2a5');handleBackButton(this);" style="font-size:18px;" title=“stuff” value="b54b4b964def18552eefff31d034d2a5">

Code:

BeautifulSoup(driver.page_source).find_all('href', {'a class':'web-btn-link easy-click'})

output:

[]

Upvotes: 0

Views: 2986

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1122392

Your first mistake is to pass an attribute name to find_all(), which interprets the first argument as a tagname instead. Next, you are asking find_all() to filter the tags it finds to only return those that have a a class attribute that matches the given value, tags can't have attribute names with a space in it.

Note that you don't have a class tags here, you have a tags, with a class and href attribute. So you'd want to use

soup = BeautifulSoup(driver.page_source)
tags = soup.find_all('a', {'class': 'web-btn-link', 'href': True})

The 'href': True filter only matches if a tag has that attribute defined. Note that I filter on just one of the two classes; see Searching By CSS Class why this matters, but you generally don't want to preclude matching tags with more than just those two classes you've found. In the vast majority of documents, you usually only need to match one of the classes (and easy-click sounds like a class for a script or CSS enhancement, applied to potentially different elements on the page).

This kind of search is much easier with a CSS .select() call:

soup = BeautifulSoup(driver.page_source)
tags = soup.select("a.web-btn-link.easy-click[href]")

This looks for a tags with at least both the web-btn-link and easy-click classes and only those that have an href attribute.

The call will still produce a sequence of tag objects, to get just the attributes, use subscription:

soup = BeautifulSoup(driver.page_source)
tags = soup.select("a.web-btn-link[href]")
urls = [t['href'] for t in tags]

Or, just printing them one by one:

for tag in tags:
    print(t[‘href’])

Upvotes: 3

undetected Selenium
undetected Selenium

Reputation: 193138

You can collect all the desired elements through BeautifulSoup and store in a list and then iterate over the list to print the href attribute as follows:

href_elements = BeautifulSoup(driver.page_source).find_all('a', {'class':'web-btn-link easy-click'})
for href_element in href_elements:
    print(href_element.href)

Upvotes: -1

Related Questions