offtheradar
offtheradar

Reputation: 35

Beautiful Soup unable to access first tag in find_all list

When using Beautiful Soup find_all() and looping over the results, I can't access the element.

<html>
...
<article post-id="123">Article 1</article>
<article post-id="456">Article 2</article>
<article post-id="789">Article 3</article>
...
</html>
articles = soup.find_all('article')

for a in articles:
    post_id = a.find('article', {'post-id':True})
    print(post_id)

returns None

Based on previous answers I've found on here, I thought the post-id value 123, 456, etc. could be accessed using post_id['post-id']

When I print a inside the for loop I see the <article...> element.

What is the correct way to access the post-id value?

Upvotes: 1

Views: 47

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

To get post-id= value, try:

from bs4 import BeautifulSoup

html_doc = """\
<html>
<article post-id="123">Article 1</article>
<article post-id="456">Article 2</article>
<article post-id="789">Article 3</article>
</html>"""

soup = BeautifulSoup(html_doc, "html.parser")

for a in soup.select("article[post-id]"):
    print(a["post-id"])

Prints:

123
456
789

  • soup.select("article[post-id]") - selects all <article> which contain post-id= attribute

  • print(a["post-id"]) - prints value of post-id= attribute

Upvotes: 1

Related Questions