Reputation: 35
When using Beautiful Soup find_all() and looping over the results, I can't access the element.
<html>
...
<article post-id="123">Article 1</article>
<article post-id="456">Article 2</article>
<article post-id="789">Article 3</article>
...
</html>
articles = soup.find_all('article')
for a in articles:
post_id = a.find('article', {'post-id':True})
print(post_id)
returns None
Based on previous answers I've found on here, I thought the post-id value 123, 456, etc. could be accessed using post_id['post-id']
When I print a
inside the for loop I see the <article...>
element.
What is the correct way to access the post-id
value?
Upvotes: 1
Views: 47
Reputation: 195408
To get post-id=
value, try:
from bs4 import BeautifulSoup
html_doc = """\
<html>
<article post-id="123">Article 1</article>
<article post-id="456">Article 2</article>
<article post-id="789">Article 3</article>
</html>"""
soup = BeautifulSoup(html_doc, "html.parser")
for a in soup.select("article[post-id]"):
print(a["post-id"])
Prints:
123
456
789
soup.select("article[post-id]")
- selects all <article>
which contain post-id=
attribute
print(a["post-id"])
- prints value of post-id=
attribute
Upvotes: 1