Hick
Hick

Reputation: 36394

how do i parse a wiki page without taking a dump of it in python?

Is it possible to parse a wiki without taking its dump , as the dump itself is way too much data to handle . Thus lets say I have the url of a certain wiki and once i call it through urllib , how do I parse it and get a certain type of data using python .

here type means a certain data corresponding to a semantic match to the search that would have been done .

Upvotes: 2

Views: 126

Answers (2)

ramdaz
ramdaz

Reputation: 1791

I'd suggest an option such as Harvestman instead, since a semantic search is likely throw multiple pages, compared to a simpler solution such as BS

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838066

You need an HTML parser to get the useful data from the HTML.

You can use BeautifulSoup to help parse the HTML. I recommend that you read the documentation and have a look at the examples there.

Upvotes: 1

Related Questions