Get all tags of a HTML with bs4

Question

I want to be able to get all tags of a HTML file, say:

I want this to return something like: ['html', 'body', 'something'] While bs4 is able to get all instances of a tag, I'm yet to find anything that can return all tags. This is the code I wrote to return a clean output.

with open('nameofhtm.html') as f:
    soup = BeautifulSoup(f, 'lxml')     
    print(soup.prettify())

Output:



 
  
  
   nothing
  
  
  
  
   nothing more

Is there a way? Thanks in advance

joni · Accepted Answer

You could use a filter function and extract all the tag names:

soup = BeautifulSoup(your_html)
tag_names = [tag.name for tag in soup.find_all(lambda tag: tag is not None)]

One could just as well use soup.find_all(name=True) to search for all tags with any tag name, i.e.

soup = BeautifulSoup(your_html)
tag_names = [tag.name for tag in soup.find_all(name=True)]

which is equivalent to the filter function.

Get all tags of a HTML with bs4

Answers (1)

Related Questions