Reputation: 14664
I am currently doing a project on person name disambiguation. The idea behind the project, that it will be able to identify the correct person, when there are multiple people with the same name. I have used wikipedia for this. I want to evaluate my project on some standard data. I am looking for some testing data. I am not familiar with popular names in wikipedia. Any idea, where I can find this data? I am not looking for vast amounts of data. I am just looking for some 100-500 examples.
Thank you
Adding more information to the question.
What I am looking for is of people with same names but are actually different. For ex, Michael Jordon is a famous basketball player and there is also a statistician with that name. I am looking for examples like this.
http://en.wikipedia.org/wiki/Michael_Jordan http://en.wikipedia.org/wiki/Michael_I._Jordan
Hope, you understand the question now.
Upvotes: 1
Views: 704
Reputation: 7141
Datasets for testing:
Good luck!
Upvotes: 2
Reputation: 44786
http://en.wikipedia.org/wiki/Category:Redirects_to_disambiguation_pages is a huge list of disambiguation pages on wikipedia. Every page linked from that contains links of pages of ambiguous names of things. Is that what you're looking for?
Upvotes: 0
Reputation: 771
wondering why can't you use the names on SO users: https://stackoverflow.com/users?tab=reputation
it is already ranked by rep - so you know the "popular names".
Upvotes: 0