Simon482
Simon482

Reputation: 137

Extract data from html

I have a html document with the structure:

<!DOCTYPE html>
<html>
<body>

<p>One</p>
<p>Two</p>
<p>Three</p>

</body>
</html>

Advise module for Python, with which I can make:

var = ModuleName.html.bode.p2
print(var)
Two

Upvotes: 0

Views: 186

Answers (2)

Paul K.
Paul K.

Reputation: 816

I would recommend you use BeautifulSoup to parse your HTML and extract the content you want with css selectors.

You can find an example of something very similar to what you want to do in the documentation : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

Edit: Here is a snippet of code since the documentation has a typo and it ommits the ":" in the selector string.

from bs4 import BeautifulSoup

data = "<!DOCTYPE html> <html> <body><p>One</p><p>Two</p><p>Three</p></body></html>"

soup = BeautifulSoup(data, 'html.parser')
print soup.body.select("p:nth-of-type(2)")

Upvotes: 1

alecxe
alecxe

Reputation: 474191

BeautifulSoup would make it quite close to what you are asking about:

from bs4 import BeautifulSoup

soup = BeautifulSoup(data)

print(soup.html.body("p")[1].text)  # prints Two

In other words, the dot here shortcuts to "find", the parenthesis shortcut to "find all".

Upvotes: 2

Related Questions