Reputation:
I have a set of HTML files which I want to pull the first tag in each file. As the files don’t have a specific tag which will always be the first in the file, I’m not sure how to do this.
As an example, for the following snippet, the first tag would be <html>
.
<html>
<head>
<title>
insert title here
</title>
</head>
</html>
Any way to accomplish this with BeautifulSoup (or possibly another tool)? Thanks in advance :)
Upvotes: 7
Views: 13534
Reputation: 473763
You can use BeautifulSoup
in this case, just issue find()
on a BeautifulSoup
object - it would find the first element in the tree. .name
would give you the tag name:
from bs4 import BeautifulSoup
data = """
<html>
<head>
<title>
insert title here
</title>
</head>
</html>
"""
soup = BeautifulSoup(data, "html.parser")
print(soup.find().name)
Upvotes: 7