Reputation: 1121
NLTK regular expressions work with tags such as:
<DT>? <JJ>* <NN>*
is there a way to include words within the regex? IE: "<N> <such> <as> <N> <and> <N>"
Upvotes: 2
Views: 494
Reputation: 5498
The easiest way is to convert the tags of the words. Modify the tag of the word you want to use in the regular expression.
Example:
import nltk
pos_tags = nltk.pos_tag(nltk.word_tokenize('Tea such as Green and Brown.'))
# use certain words as it is in grammar
same_word_tags = ['such', 'as', 'and']
pos_tags = [
(w, w.upper()) if w in same_word_tags else (w, t)
for w, t in pos_tags
]
grammar = "CHUNK: {<NN.*> <SUCH> <AS> <NN.*> <AND> <NN.*>}"
tree = nltk.RegexpParser(grammar).parse(pos_tags)
Upvotes: 0
Reputation: 107337
As i remember <DT>? <JJ>* <NN>*
is a chunk pattern . and the chunk patterns are converted internally to regular expressions using the tag_pattern2re_pattern()
function:
>>> from nltk.chunk import tag_pattern2re_pattern
>>> tag_pattern2re_pattern('<DT>?<NN.*>+')
'(<(DT)>)?(<(NN[^\\{\\}<>]*)>)+'
then you could put your words inside the regex pattern result .
Upvotes: 2