root
root

Reputation: 80346

BeautifulSoup: extracting HTML tag attributes

Is there a way to get HTML tag attributes only when text=True without specifying the tags.

Example:

html=<p class="c4">SOMETEXT</p>

I could do:

[tag.attrs for tag in soup.findAll('p')]
>>> [[(u'class', u'c1')]]

Is there a way to do:

[text.attrs for text in soup.findAll(text=True)]

Help much appriciated!

Upvotes: 1

Views: 1274

Answers (2)

Jon Clements
Jon Clements

Reputation: 142106

Think you want this as the question has been clarified:

[tag.attrs for tag in soup.findAll(True) if tag.string]

.findAll(True) returns all tags in the document, so they'll have an .attr even if it's empty, and filter if the tag has .string content.

Upvotes: 3

Burhan Khalid
Burhan Khalid

Reputation: 174614

>>> from bs4 import BeautifulSoup as bs
>>> html = '<p class="c4">SOMETEXT</p><p class="c5"></p>'
>>> soup = bs(html)
>>> [tag.attrs for tag in soup.findAll('p') if tag.string]
[{'class': ['c4']}] 

Upvotes: 1

Related Questions