Reputation: 36394
Is it possible to parse a wiki without taking its dump , as the dump itself is way too much data to handle . Thus lets say I have the url of a certain wiki and once i call it through urllib , how do I parse it and get a certain type of data using python .
here type means a certain data corresponding to a semantic match to the search that would have been done .
Upvotes: 2
Views: 126
Reputation: 1791
I'd suggest an option such as Harvestman instead, since a semantic search is likely throw multiple pages, compared to a simpler solution such as BS
Upvotes: 0
Reputation: 838066
You need an HTML parser to get the useful data from the HTML.
You can use BeautifulSoup to help parse the HTML. I recommend that you read the documentation and have a look at the examples there.
Upvotes: 1