Rupjit Chakraborty
Rupjit Chakraborty

Reputation: 115

Noun Phrase Extraction Regular Expression

What is the interpretation of the RE given below

r'KT: {(<JJ>* <NN.*>+ <IN>)? <JJ>* <NN.*>+}'

I don't know what KT is but JJ is adjective, NN is noun and IN is preposition.

EDIT: reposting the link http://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/

Upvotes: 0

Views: 1251

Answers (1)

sam
sam

Reputation: 1436

Assuming you're working with the Penn part-of-speech tags,

<NN.*>+ matches at least one of

  • NN: Noun, singular or mass
  • NNS: Noun, plural
  • NNP: Proper noun, singular
  • NNPS: Proper noun, plural

<JJ> matches at least zero adjectives (not comparative or superlative), so that's optional.

The <JJ>* <NN.*>+ part of your RegEx thus matches at least one noun. That noun can be preceded by any number of adjectives. For example:

  • cats
  • brown cats
  • cute brown cats

(<JJ>* <NN.*>+ <IN>)? means that the above can be preceded by another noun phrase and a preposition (IN), such as

  • green eyes of cute brown cats

KT is not a part-of-speech tag. The code you've referenced works with NLTK's RegexpParser, where grammars are (roughly) defined as Label: {rules}. So KT is really just a label that each identified noun phrase will take; you might as well name it NP or NounPhrase.

Upvotes: 2

Related Questions