azpublic
azpublic

Reputation: 1404

What tools can I use to find Part Of Speech Patterns

I am looking for tools to find Part Of Speech patterns on a corpus of documents. I am using the Stanford NLP tools for POS tagging my documents. Now I would like to query these tagged documents and find some specific POS patterns such as for example

NP is JJ (ex: the movie is nice)

or JJ NP (ex : excellent foie gras)

Is there a tool that can do this for me in a simple and efficient manner or do I need to write my own ?

Upvotes: 2

Views: 968

Answers (2)

Gabor Angeli
Gabor Angeli

Reputation: 5759

From Stanford CoreNLP, you can also use TokensRegex to match a pattern in a list of tokens: http://nlp.stanford.edu/software/tokensregex.shtml

For example, your two patterns would be something like:

[{tag:NN}] [{word:is}] [{tag:JJ}]

[{tag:JJ}] [{tag:NN}]

(Side note, but NP is not a POS tag. Likely, really, what you want is [{tag:/N.*/}] and [{lemma:be}] to catch a broader range of cases).

Upvotes: 2

aab
aab

Reputation: 11494

One tool to consider is the Corpus Workbench: http://cwb.sourceforge.net/

Upvotes: 1

Related Questions