Reputation: 90475
I want to extract a list of all dead people in Wikipedia and compare their ages when they died. All dead people in Wikipedia has the following fields filled:
| birth_name = Thomas Alva Edison
| birth_date = {{birth date|mf=yes|1847|02|11}}
| death_date ={{death date and age|mf=yes|1931|10|18|1847|02|11}}
I will have to make a crawler? There is anything in the Wikipedia API that can help me? Is there any place where I can start to crawl? Any list of dead people?
Upvotes: 1
Views: 411
Reputation: 3587
This is what DBpedia is for - all the structured data from Wikipedia in a database. Try the following query at http://dbpedia.org/sparql :
select distinct ?p, ?d where {
?p a <http://dbpedia.org/ontology/Person> .
?p <http://dbpedia.org/ontology/deathDate> ?d .
}
Upvotes: 0
Reputation: 8334
You can find a dump of all the contents of Wikipedia available for download here:
http://dumps.wikimedia.org/enwiki/latest/
The file is an .xml
file of several gigabytes in size, and contains the text of all the pages on Wikipedia (amongst other things). How you process this depends on what programming language you're going to use.
Upvotes: 1