Reputation: 1705
I have a data set which is annotated by Collins parser. Right now, I am keeping the POS of each word in the data set as a feature. The problem is that I don't need fine-grained POS. So, I have combined some of the tags. For example, I assume all VBD,VBP,VBZ,VBG under the category of "Verb". And for nouns, I assume NNP and NNS as "Noun" category.
So, here is the list of POS tags that I have after doing all combinations:
VB, NN, TO, JJ, IN, EX, RB, WP, PRP, MD, UH, WRB, WDT, RP, CD, POS, DT, PRP$, WP$, CC, RBR
Now, my question is where can I find a list of coarse-grained POS tags? Is there any standard coarse-grained POS tag list?
In my system, If I don't combine other POS tags, I can get better results. I am wondering if I am allowed to keep my current list? Or should I combine them as well?
Thanks in advance,
Upvotes: 4
Views: 1667
Reputation: 326
You can use Petrov's universal tag set. The universal tag set is 12 in number and increases the POS tagging efficiency drastically. You can refer Universal POS tagset You can also download the code and the mappings for few taggers at POS mapping
Upvotes: 4