Reputation: 479
I want to implement a part-of-speech tagger,but I don't know where I can get a lot of training data? Thanks!
Upvotes: 4
Views: 6771
Reputation: 2710
https://catalog.ldc.upenn.edu/LDC99T42 <--- They want $1700.00 or $850.00 if you have a Reduced-License :-(
https://www.kaggle.com/nltkdata/penn-tree-bank <--- You gotta love Kaggle!
https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/version/4 <--- You gotta love Kaggle even more!
Upvotes: 3
Reputation:
There's a training set and testing set from the chunking shared task of the CoNLL-2000 conference here:
http://www.cnts.ua.ac.be/conll2000/chunking/
Others have used this to train part-of-speech taggers:
https://code.google.com/p/miralium/wiki/PosTaggerTutorial
Upvotes: 5