Reputation: 165
I am creating a preprocessor for an NLP project, and the lemmatizer is not working as expected. I expected the code to lemmatize every word, but I am seeing the error AttributeError: 'tuple' object has no attribute 'endswith'
. Sorry if it is a stupid mistake, but what am I doing wrong? I am using Python. Here is my code:
from pymongo import MongoClient
from nltk import *
import nltk
lemma = WordNetLemmatizer()
client = MongoClient()
db = client.qa
main = db.main
while True:
question = input('Ask a question: ').upper()
question = re.sub('[^0-9A-Z\s]', '', question)
question = word_tokenize(question)
question = nltk.pos_tag(question)
for each in question:
lemma.lemmatize(each)
print(question)
Update:
I have updated the code so that it compiles, but it is not actually lemmatizing the words now. Here is the updated code:
from pymongo import MongoClient
from nltk import *
lemma = WordNetLemmatizer()
client = MongoClient()
db = client.qa
main = db.main
while True:
question = input('Ask a question: ').upper()
question = re.sub('[^0-9A-Z\s]', '', question)
question = word_tokenize(question)
for each in question:
lemma.lemmatize(each[0])
print(question)
Upvotes: 1
Views: 1230
Reputation: 122012
TL;DR:
from pymongo import MongoClient
from nltk import word_tokenize, pos_tag, WordNetLemmatizer
wnl = WordNetLemmatizer()
client = MongoClient()
db = client.qa
main = db.main
while True:
question = input('Ask a question: ').upper()
question = re.sub('[^0-9A-Z\s]', '', question)
question = word_tokenize(question)
question = nltk.pos_tag(question)
for each in question:
wnl.lemmatize(each[0])
print(question)
Explanation in comments:
>>> from nltk import word_tokenize, pos_tag, WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> sent = "this is a two parts sentence, with some weird lemmas"
>>> word_tokenize(sent) # Return a list of string
['this', 'is', 'a', 'two', 'parts', 'sentence', ',', 'with', 'some', 'weird', 'lemmas']
>>> pos_tag(word_tokenize(sent)) # Returns a list of tuple with (word, pos)
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('two', 'CD'), ('parts', 'NNS'), ('sentence', 'NN'), (',', ','), ('with', 'IN'), ('some', 'DT'), ('weird', 'JJ'), ('lemmas', 'NN')]
>>> pos_tag(word_tokenize(sent))[0]
('this', 'DT')
>>> pos_tag(word_tokenize(sent))[0][0]
'this'
>>> each = pos_tag(word_tokenize(sent))[0][0]
>>> each
'this'
>>> wnl.lemmatize(each)
'this'
Upvotes: 2