Reputation: 573
I have page source that has “a class” links like the example below. I would like to return a list containing all the “href” values, so in the example below "/detail/Request-Technology%2C-LLC-Oakland-CA-94609/napil006/cyberMan”. I’m using beautiful soup with find_all trying to use the ‘a class’ attribute, but it doesn’t return anything. Can anyone see what I’m doing wrong and suggest a solution?
source:
<a class="web-btn-link easy-click" href="/detail/Request-Technology%2C-LLC-Oakland-CA-94609/napil006/cyberMan" id="position15" onclick="cookieJobID('b54b4b964def18552eefff31d034d2a5');handleBackButton(this);" style="font-size:18px;" title=“stuff” value="b54b4b964def18552eefff31d034d2a5">
Code:
BeautifulSoup(driver.page_source).find_all('href', {'a class':'web-btn-link easy-click'})
output:
[]
Upvotes: 0
Views: 2986
Reputation: 1122392
Your first mistake is to pass an attribute name to find_all()
, which interprets the first argument as a tagname instead. Next, you are asking find_all()
to filter the tags it finds to only return those that have a a class
attribute that matches the given value, tags can't have attribute names with a space in it.
Note that you don't have a class
tags here, you have a
tags, with a class
and href
attribute. So you'd want to use
soup = BeautifulSoup(driver.page_source)
tags = soup.find_all('a', {'class': 'web-btn-link', 'href': True})
The 'href': True
filter only matches if a tag has that attribute defined. Note that I filter on just one of the two classes; see Searching By CSS Class why this matters, but you generally don't want to preclude matching tags with more than just those two classes you've found. In the vast majority of documents, you usually only need to match one of the classes (and easy-click
sounds like a class for a script or CSS enhancement, applied to potentially different elements on the page).
This kind of search is much easier with a CSS .select()
call:
soup = BeautifulSoup(driver.page_source)
tags = soup.select("a.web-btn-link.easy-click[href]")
This looks for a
tags with at least both the web-btn-link
and easy-click
classes and only those that have an href
attribute.
The call will still produce a sequence of tag objects, to get just the attributes, use subscription:
soup = BeautifulSoup(driver.page_source)
tags = soup.select("a.web-btn-link[href]")
urls = [t['href'] for t in tags]
Or, just printing them one by one:
for tag in tags:
print(t[‘href’])
Upvotes: 3
Reputation: 193138
You can collect all the desired elements through BeautifulSoup and store in a list and then iterate over the list to print the href
attribute as follows:
href_elements = BeautifulSoup(driver.page_source).find_all('a', {'class':'web-btn-link easy-click'})
for href_element in href_elements:
print(href_element.href)
Upvotes: -1