Reputation: 3512
I am trying to do some research on the wikipedia data, I am good at Python.
I came across this library, seems nice: https://pypi.python.org/pypi/wikipedia/
I don't want to hit wikipedia directly as this is slow, and also I am trying to access a lot of data and might run into their API limits.
Can I somehow hack this to make it access a local instance of wikipedia data. I know I can run a whole wikipedia server and try to do that, but that seems a round about way.
Is there a way to just point to the folder and get this library to work as it does. Or are you aware of any other libraries that do this?
thank you.
Upvotes: 1
Views: 567
Reputation: 3512
I figured out what I need. I think I shouldn't be searching for API, what I am looking for is a parser. Here are a couple options I have narrowed down so far. Both seem like solid starting points.
wikidump: https://pypi.python.org/pypi/wikidump/0.1.2
mwlib: https://pypi.python.org/pypi/mwlib/0.15.14
Update: While these are good parsers for wikipedia data, I found them too limiting in one way or the other, not to mention the lack of documentation. So I eventually went with good old python ElementTree and directly work with the XML.
Upvotes: 2