What tools can I use to find Part Of Speech Patterns

Question

I am looking for tools to find Part Of Speech patterns on a corpus of documents. I am using the Stanford NLP tools for POS tagging my documents. Now I would like to query these tagged documents and find some specific POS patterns such as for example

NP is JJ (ex: the movie is nice)

or JJ NP (ex : excellent foie gras)

Is there a tool that can do this for me in a simple and efficient manner or do I need to write my own ?

Gabor Angeli · Accepted Answer

From Stanford CoreNLP, you can also use TokensRegex to match a pattern in a list of tokens: http://nlp.stanford.edu/software/tokensregex.shtml

For example, your two patterns would be something like:

[{tag:NN}] [{word:is}] [{tag:JJ}]

[{tag:JJ}] [{tag:NN}]

(Side note, but NP is not a POS tag. Likely, really, what you want is [{tag:/N.*/}] and [{lemma:be}] to catch a broader range of cases).

What tools can I use to find Part Of Speech Patterns

Answers (2)

Related Questions