Reputation: 649
I was wondering if there was a way to extract data out of an HTML table and parse it into a dictionary using just HTMLParser. I'm not able to do it for some reason..
Upvotes: 0
Views: 330
Reputation: 20895
You could use lxml to parse a web page. http://lxml.de/
You could scrape a web page with
from lxml.html import parse
site = parse('http://java.sun.com')
What's returned here is an lxml element tree: http://lxml.de/api.html
Then, you can use xpath to get HTML content (http://www.w3schools.com/xpath/):
tableData = site.xpath('//table//td[@id="someTdID"]')
lxml is a pretty powerful library, and is widely used to scrape data. You could then feed this data into python dictionaries/lists or process it however you like.
Upvotes: 1