Reputation: 141
I want to be able to get all tags of a HTML file, say:
<html>
<body>
<something>
</something>
</body>
</html>
I want this to return something like: ['html', 'body', 'something']
While bs4 is able to get all instances of a tag, I'm yet to find anything that can return
all tags. This is the code I wrote to return a clean output.
with open('nameofhtm.html') as f:
soup = BeautifulSoup(f, 'lxml')
print(soup.prettify())
Output:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>
nothing
</title>
<link href="None" rel="shortcut icon"/>
<link href="style.css" rel="stylesheet"/>
<header>
nothing more
</header>
<something>
</something>
</head>
</html>
Is there a way? Thanks in advance
Upvotes: 0
Views: 536
Reputation: 7157
You could use a filter function and extract all the tag names:
soup = BeautifulSoup(your_html)
tag_names = [tag.name for tag in soup.find_all(lambda tag: tag is not None)]
One could just as well use soup.find_all(name=True)
to search for all tags with any tag name, i.e.
soup = BeautifulSoup(your_html)
tag_names = [tag.name for tag in soup.find_all(name=True)]
which is equivalent to the filter function.
Upvotes: 2