Jader Dias
Jader Dias

Reputation: 90475

How to extract statistics from Wikipedia?

I want to extract a list of all dead people in Wikipedia and compare their ages when they died. All dead people in Wikipedia has the following fields filled:

| birth_name = Thomas Alva Edison
| birth_date = {{birth date|mf=yes|1847|02|11}}
| death_date ={{death date and age|mf=yes|1931|10|18|1847|02|11}}

I will have to make a crawler? There is anything in the Wikipedia API that can help me? Is there any place where I can start to crawl? Any list of dead people?

Upvotes: 1

Views: 411

Answers (2)

This is what DBpedia is for - all the structured data from Wikipedia in a database. Try the following query at http://dbpedia.org/sparql :

select distinct ?p, ?d where {
  ?p a <http://dbpedia.org/ontology/Person> .
  ?p <http://dbpedia.org/ontology/deathDate> ?d .
}

Upvotes: 0

Edoardo Pirovano
Edoardo Pirovano

Reputation: 8334

You can find a dump of all the contents of Wikipedia available for download here:

http://dumps.wikimedia.org/enwiki/latest/

The file is an .xml file of several gigabytes in size, and contains the text of all the pages on Wikipedia (amongst other things). How you process this depends on what programming language you're going to use.

Upvotes: 1

Related Questions