Reputation: 5591
I'm about to write some example applications and accompanying documents comparing ways of accessing information stored in relational databases. To demonstrate real-life requirements, I need to include a realistic dataset of hundreds of thousands of facts.
Is anyone aware of publicly available, free datasets of that magnitude, of datasets of human names with human-level variance, or hierarchical datasets of either large organizational hierarchies, or large hierarchical, categorized, product catalogues?
Please point me in the right direction, if you are.
Part 1, human names: http://timecenter.cs.aau.dk/software.htm
Part 2, hierarchical data: no answer yet
Upvotes: 5
Views: 5295
Reputation: 391952
Your own PC's directory tree is a large hierarchical structure with lots of facts. You probably have a few thousand "Facts" which are file names, modification dates, sizes, extra OS info, etc., etc.
If that's not large enough, find a server that you can login to. That will be larger.
Not large enough? Get a web crawler and start crawling a big web site. That can be as large as you have the patience to crawl.
Upvotes: 3
Reputation: 116237
The wikipedia dump is pretty massive: obligatory wikipedia link.
Upvotes: 3