user21398
user21398

Reputation: 1493

How find specific data attribute from html tag in BeautifulSoup4?

Is there a way to find an element using only the data attribute in html, and then grab that value?

For example, with this line inside an html doc:

<ul data-bin="Sdafdo39">

How do I retrieve Sdafdo39 by searching the entire html doc for the element that has the data-bin attribute?

Upvotes: 28

Views: 42603

Answers (4)

Maximosaic
Maximosaic

Reputation: 634

As an alternative if one prefers to use CSS selectors via select() instead of find_all():

from bs4 import BeautifulSoup
html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">"""
soup = BeautifulSoup(html_doc)

# Select
soup.select('ul[data-bin]')

Upvotes: 4

emehex
emehex

Reputation: 10568

You could solve this with gazpacho in just a couple of lines:

First, import and turn the html into a Soup object:

from gazpacho import Soup

html = """<ul data-bin="Sdafdo39">"""
soup = Soup(html)

Then you can just search for the "ul" tag and extract the href attribute:

soup.find("ul").attrs["data-bin"]
# Sdafdo39

Upvotes: 4

xecgr
xecgr

Reputation: 5193

A little bit more accurate

[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]


This way, the iterated list only has the ul elements that has the attr you want to find

from bs4 import BeautifulSoup
bs = BeautifulSoup(html_doc)
html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">"""
[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]


Upvotes: 43

thefourtheye
thefourtheye

Reputation: 239653

You can use find_all method to get all the tags and filtering based on "data-bin" found in its attributes will get us the actual tag which has got it. Then we can simply extract the value corresponding to it, like this

from bs4 import BeautifulSoup
html_doc = """<ul data-bin="Sdafdo39">"""
bs = BeautifulSoup(html_doc)
print [item["data-bin"] for item in bs.find_all() if "data-bin" in item.attrs]
# ['Sdafdo39']

Upvotes: 19

Related Questions