Reputation: 115
What is the interpretation of the RE given below
r'KT: {(<JJ>* <NN.*>+ <IN>)? <JJ>* <NN.*>+}'
I don't know what KT is but JJ is adjective, NN is noun and IN is preposition.
EDIT: reposting the link http://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/
Upvotes: 0
Views: 1251
Reputation: 1436
Assuming you're working with the Penn part-of-speech tags,
<NN.*>+
matches at least one of
NN
: Noun, singular or massNNS
: Noun, pluralNNP
: Proper noun, singularNNPS
: Proper noun, plural<JJ>
matches at least zero adjectives (not comparative or superlative), so that's optional.
The <JJ>* <NN.*>+
part of your RegEx thus matches at least one noun. That noun can be preceded by any number of adjectives. For example:
cats
brown cats
cute brown cats
(<JJ>* <NN.*>+ <IN>)?
means that the above can be preceded by another noun phrase and a preposition (IN
), such as
green eyes of cute brown cats
KT
is not a part-of-speech tag. The code you've referenced works with NLTK's RegexpParser, where grammars are (roughly) defined as Label: {rules}
. So KT
is really just a label that each identified noun phrase will take; you might as well name it NP
or NounPhrase
.
Upvotes: 2