Aniruddh Chaturvedi
Aniruddh Chaturvedi

Reputation: 649

Parsing HTML Tables to Lists in Python w/o BeautifulSoup

I was wondering if there was a way to extract data out of an HTML table and parse it into a dictionary using just HTMLParser. I'm not able to do it for some reason..

Upvotes: 0

Views: 330

Answers (1)

dangerChihuahua007
dangerChihuahua007

Reputation: 20895

You could use lxml to parse a web page. http://lxml.de/

You could scrape a web page with

from lxml.html import parse
    site = parse('http://java.sun.com')

What's returned here is an lxml element tree: http://lxml.de/api.html

Then, you can use xpath to get HTML content (http://www.w3schools.com/xpath/):

tableData = site.xpath('//table//td[@id="someTdID"]')

lxml is a pretty powerful library, and is widely used to scrape data. You could then feed this data into python dictionaries/lists or process it however you like.

Upvotes: 1

Related Questions