Pos Tag Lemmatize giving only one row in output

Question

Using Pos Tag on tokenize data, it is coming into form of word, pos_tag. When passing the same for lemmatization, only the first value is getting lemmatized.

Dataframe with two columns-

ID Text 
1  Lemmatization is an interesting part

After tokenize and removing stop words -

ID Tokenize_data
1  'Lemmatization', 'interesting', 'part'



#Lemmatization with postag
#Part of Speech Tagging
df2['tag_words'] = df2.tokenize_data.apply(nltk.pos_tag)
#Treebank to Wordnet
from nltk.corpus import wordnet

def get_wordnet_pos(treebank_tag):

    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    else:
        return None

from nltk.stem.wordnet import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

def tagging(text):
#tagged = nltk.pos_tag(tokens)
    for (word, tag) in text:
        wntag = get_wordnet_pos(tag)
        if wntag is None:# not supply tag in case of None
            lemma = lemmatizer.lemmatize(word) 
        else:
            lemma = lemmatizer.lemmatize(word, pos=wntag) 
        return lemma

tag1 = lambda x: tagging(x)
df2['lemma_tag'] = df2.tag_words.apply(tag1)

Output is coming as -

ID Lemma_words 
1  'Lemmatize'

Expected -

ID Lemma_words
1  'Lemmatize', 'interest', 'part'

Pos Tag Lemmatize giving only one row in output

Answers (1)

Related Questions