Reputation: 10213
I want to get all data-js
attribute values from the content by BeautifulSoup.
Input:
<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>
Output:
['1, 2, 3', '5', '4']
I've done it with lxml:
>>> content = """<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>"""
>>> import lxml.html as PARSER
>>> root = PARSER.fromstring(content)
>>> root.xpath("//*/@data-js")
['1, 2, 3', '5', '4']
I want the above result via BeautifulSoup.
Upvotes: 4
Views: 1236
Reputation: 4058
May a faster method with map
without list comprehension.
from bs4 import BeautifulSoup
d = "..."
# create a soup instance
soup = BeautifulSoup(d)
# find all p-elements containing an data-js attribute
p = soup.find_all('p', attrs={"data-js": True})
# unpack data-js attribute from p-elements and map to new list
print map(lambda x: x['data-js'], p)
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all
Upvotes: 3
Reputation: 2915
You can use find_all() for this, but you have to put the attribute name in a dictionary, because it can't be used as a keyword argument by itself.
html = BeautifulSoup(content)
data = html.find_all(attrs={'data-js': True})
See here for more explanation.
Upvotes: 2
Reputation: 473863
The idea would to find all elements having data-js
attributes and collect them in a list:
from bs4 import BeautifulSoup
data = """
<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>
"""
soup = BeautifulSoup(data)
print [elm['data-js'] for elm in soup.find_all(attrs={"data-js": True})]
Prints ['1, 2, 3', '5', '4']
.
Upvotes: 4