Reputation: 1404
I am looking for tools to find Part Of Speech patterns on a corpus of documents. I am using the Stanford NLP tools for POS tagging my documents. Now I would like to query these tagged documents and find some specific POS patterns such as for example
NP is JJ (ex: the movie is nice)
or JJ NP (ex : excellent foie gras)
Is there a tool that can do this for me in a simple and efficient manner or do I need to write my own ?
Upvotes: 2
Views: 968
Reputation: 5759
From Stanford CoreNLP, you can also use TokensRegex to match a pattern in a list of tokens: http://nlp.stanford.edu/software/tokensregex.shtml
For example, your two patterns would be something like:
[{tag:NN}] [{word:is}] [{tag:JJ}]
[{tag:JJ}] [{tag:NN}]
(Side note, but NP is not a POS tag. Likely, really, what you want is [{tag:/N.*/}] and [{lemma:be}] to catch a broader range of cases).
Upvotes: 2
Reputation: 11494
One tool to consider is the Corpus Workbench: http://cwb.sourceforge.net/
Upvotes: 1