indieman
indieman

Reputation: 1121

Include words in nltk regular expressions

NLTK regular expressions work with tags such as:

<DT>? <JJ>* <NN>*

is there a way to include words within the regex? IE: "<N> <such> <as> <N> <and> <N>"

Upvotes: 2

Views: 494

Answers (2)

Pratyush
Pratyush

Reputation: 5498

The easiest way is to convert the tags of the words. Modify the tag of the word you want to use in the regular expression.

Example:

import nltk

pos_tags = nltk.pos_tag(nltk.word_tokenize('Tea such as Green and Brown.'))

# use certain words as it is in grammar
same_word_tags = ['such', 'as', 'and']
pos_tags = [
    (w, w.upper()) if w in same_word_tags else (w, t)
    for w, t in pos_tags
]

grammar = "CHUNK: {<NN.*> <SUCH> <AS> <NN.*> <AND> <NN.*>}"
tree = nltk.RegexpParser(grammar).parse(pos_tags)

Upvotes: 0

Kasravnd
Kasravnd

Reputation: 107337

As i remember <DT>? <JJ>* <NN>* is a chunk pattern . and the chunk patterns are converted internally to regular expressions using the tag_pattern2re_pattern() function:

>>> from nltk.chunk import tag_pattern2re_pattern
>>> tag_pattern2re_pattern('<DT>?<NN.*>+')
'(<(DT)>)?(<(NN[^\\{\\}<>]*)>)+'

then you could put your words inside the regex pattern result .

Upvotes: 2

Related Questions