jason
jason

Reputation: 3512

Python library for accessing local wikipedia?

I am trying to do some research on the wikipedia data, I am good at Python.

I came across this library, seems nice: https://pypi.python.org/pypi/wikipedia/

I don't want to hit wikipedia directly as this is slow, and also I am trying to access a lot of data and might run into their API limits.

Can I somehow hack this to make it access a local instance of wikipedia data. I know I can run a whole wikipedia server and try to do that, but that seems a round about way.

Is there a way to just point to the folder and get this library to work as it does. Or are you aware of any other libraries that do this?

thank you.

Upvotes: 1

Views: 567

Answers (1)

jason
jason

Reputation: 3512

I figured out what I need. I think I shouldn't be searching for API, what I am looking for is a parser. Here are a couple options I have narrowed down so far. Both seem like solid starting points.

wikidump: https://pypi.python.org/pypi/wikidump/0.1.2

mwlib: https://pypi.python.org/pypi/mwlib/0.15.14

Update: While these are good parsers for wikipedia data, I found them too limiting in one way or the other, not to mention the lack of documentation. So I eventually went with good old python ElementTree and directly work with the XML.

Upvotes: 2

Related Questions