gizgok
gizgok

Reputation: 7649

Where can I get wikipedia XML corpus

I don't know if this can be asked here, but I have looked so hard for this and have reached deadend time and again. I'm working on a project for Information Retrieval Research. I've coded up my search engine but cannot test it because I need this xml corpus of Wikipedia. This I found http://www-connex.lip6.fr/~denoyer/wikipediaXML/ but it turns out useless. Please let me know if someone knows a way of getting me this corpus

Upvotes: 1

Views: 2592

Answers (1)

Felipe Hummel
Felipe Hummel

Reputation: 4774

The page you provided looks like to be presenting the Wikipedia XML corpus used in the 2007 INEX workshop. I've found this site which holds the wikipedia dataset used in 2009-2010 ad hoc (I think clustering too) track in INEX. I think you can use it as well.

Just in case you can use the official wikimedia XML dump: English Wikipedia Dumps. More information and other languages: Wikipedia Database Download

Upvotes: 3

Related Questions