Reputation: 677
this must be simple but I'm missing it somehow. I have the code:
import nltk
f=open('...\\t.txt','rU')
raw=f.read()
tokens = nltk.word_tokenize(raw)
print nltk.pos_tag(tokens)
which returns for instance:
"[('processes', 'NNS'), ('a', 'DT'), ('sequence', 'NN'), ('of', 'IN'), ('words', 'NNS')]
I was wondering how I could just collected solely all 'NN' for example or all 'DT' AND 'IN' instead of every member of the string.
thanks in advance
Upvotes: 3
Views: 4773
Reputation: 122032
You can extract only the tags you want with a list comprehension, e.g.:
>>> tags = nltk.pos_tag(tokens)
>>> dt_tags = [t for t in tags if t[1] == "DT"]
>>> dt_tags
[('a', 'DT')]
Upvotes: 5