Reputation: 2998
I just coded a Markov chain that talks based on learned data. I'd like a resource of a lot of text data online, but can't seem to find any (most sites like Wikipedia have a lot of junk, not plain text files).
Is there any site that would have a lot of text file that is suitable to test a Markov chain on?
Upvotes: 0
Views: 44
Reputation: 6847
Consider the Enron Email Dataset: https://www.cs.cmu.edu/~./enron/
It is also hosted on Amazon AWS: https://aws.amazon.com/datasets/enron-email-data/
Upvotes: 0
Reputation: 965
gutenberg.org might have some resources for you. For example, here's what appears to be a bunch of Moby Dick, in text file form.
http://www.gutenberg.org/files/2701/2701.txt
Upvotes: 2
Reputation: 5030
If your concern is just removing the tag from wikipedia, how about using source like this one that they remove the tag for you?
http://kopiwiki.dsd.sztaki.hu/
Upvotes: 1