asher drummond
asher drummond

Reputation: 165

AttributeError: 'tuple' attribute has no attribute 'endswith' Python NLTK Lemmatizer

I am creating a preprocessor for an NLP project, and the lemmatizer is not working as expected. I expected the code to lemmatize every word, but I am seeing the error AttributeError: 'tuple' object has no attribute 'endswith'. Sorry if it is a stupid mistake, but what am I doing wrong? I am using Python. Here is my code:

from pymongo import MongoClient
from nltk import *
import nltk
lemma = WordNetLemmatizer()
client = MongoClient()
db = client.qa
main = db.main

while True:
    question = input('Ask a question: ').upper()
    question = re.sub('[^0-9A-Z\s]', '', question)
    question = word_tokenize(question)
    question = nltk.pos_tag(question)
    for each in question:
        lemma.lemmatize(each)
    print(question)

Update:

I have updated the code so that it compiles, but it is not actually lemmatizing the words now. Here is the updated code:

from pymongo import MongoClient
from nltk import *
lemma = WordNetLemmatizer()
client = MongoClient()
db = client.qa
main = db.main

while True:
    question = input('Ask a question: ').upper()
    question = re.sub('[^0-9A-Z\s]', '', question)
    question = word_tokenize(question)
    for each in question:
        lemma.lemmatize(each[0])
    print(question)

Upvotes: 1

Views: 1230

Answers (1)

alvas
alvas

Reputation: 122012

TL;DR:

from pymongo import MongoClient
from nltk import word_tokenize, pos_tag, WordNetLemmatizer

wnl = WordNetLemmatizer()
client = MongoClient()
db = client.qa
main = db.main

while True:
    question = input('Ask a question: ').upper()
    question = re.sub('[^0-9A-Z\s]', '', question)
    question = word_tokenize(question)
    question = nltk.pos_tag(question)
    for each in question:
        wnl.lemmatize(each[0])
    print(question)

Explanation in comments:

>>> from nltk import word_tokenize, pos_tag, WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> sent = "this is a two parts sentence, with some weird lemmas"
>>> word_tokenize(sent) # Return a list of string
['this', 'is', 'a', 'two', 'parts', 'sentence', ',', 'with', 'some', 'weird', 'lemmas']
>>> pos_tag(word_tokenize(sent)) # Returns a list of tuple with (word, pos)
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('two', 'CD'), ('parts', 'NNS'), ('sentence', 'NN'), (',', ','), ('with', 'IN'), ('some', 'DT'), ('weird', 'JJ'), ('lemmas', 'NN')]
>>> pos_tag(word_tokenize(sent))[0]
('this', 'DT')
>>> pos_tag(word_tokenize(sent))[0][0]
'this'
>>> each = pos_tag(word_tokenize(sent))[0][0]
>>> each
'this'
>>> wnl.lemmatize(each)
'this'

Upvotes: 2

Related Questions