Khrystyna Pyurkovska
Khrystyna Pyurkovska

Reputation: 99

parser for information extraction from Web in Python

My task is to parse an HTML page (in cyrillic) and to extract certain words. Here's a web page I have to parse: http://www.toponymic-dictionary.in.ua/. I only got the page:

import urllib
from lxml.html import fromstring
url = 'http://www.toponymic-dictionary.in.ua/'
content = urllib.urlopen(url).read()
doc = fromstring(content)
doc.make_links_absolute(url)

The HTML code is quite complicated for me (to use xpath), so I don't know how to proceed into parsing.

Upvotes: 1

Views: 227

Answers (1)

vivek
vivek

Reputation: 2867

Have a look this library: BeautifulSoup

And its Documentation

It fits best for your requirement.

Cheers!

Upvotes: 1

Related Questions