Hugolpz
Hugolpz

Reputation: 18278

How to get the complete list of pages titles from wikipedia ?

I want to get this list to later work on it for linquistic researches.

The API:Allpages is limited to 500 queries. I need them all (4millions).

Maybe attack it using dbpedia.

Any trick to do it ?

Upvotes: 3

Views: 367

Answers (1)

nneonneo
nneonneo

Reputation: 179717

The Wikimedia Foundation, which runs Wikipedia, posts periodic dumps of all their projects to http://dumps.wikimedia.org.

You can browse the latest enwiki dump (as of this posting) here: http://dumps.wikimedia.org/enwiki/20130204/.

The file which is probably most interesting to you is this list of all page titles: http://dumps.wikimedia.org/enwiki/20130204/enwiki-20130204-all-titles-in-ns0.gz.

Upvotes: 8

Related Questions