NLTK and Pandas - adding synsets into a list

Question

I wanted to great a list that is added as new row to a dataframe.

import nltk
import pandas as pd
from nltk.corpus import wordnet
import pandas as pd
import numpy as np


Overviewdataframe = pd.DataFrame([]) 
synonyms = []

for syn in wordnet.synsets("active"):
    for l in syn.lemmas():
            synonyms.append(l.name())  
            Overviewdataframe = Overviewdataframe.append(synonyms)
            synonyms = []

Instead the row is added as column. Can you help me please!

Thank you.

alvas · Accepted Answer

TL;DR

from itertools import chain

import pandas as pd
from nltk.corpus import wordnet as wn

wordlist = ['active', 'fan', 'hop', 'grace']

words2lemmanames = [{'word': word, 'synset':ss.name(), 'lemma_names':ss.lemma_names()}
                    for word in wordlist for ss in wn.synsets(word)]
pd.DataFrame(words2lemmanames)

In Long

When querying the WordNet interface in NLTK, querying a word returns a "concept" also known as "synset"

>>> wn.synsets('active')

[Synset('active_agent.n.01'), Synset('active_voice.n.01'), Synset('active.n.03'), Synset('active.a.01'), Synset('active.s.02'), Synset('active.a.03'), Synset('active.s.04'), Synset('active.a.05'), Synset('active.a.06'), Synset('active.a.07'), Synset('active.s.08'), Synset('active.a.09'), Synset('active.a.10'), Synset('active.a.11'), Synset('active.a.12'), Synset('active.a.13'), Synset('active.a.14')]

Each synset has its own list of lemma names, i.e.

>>> wn.synsets('active')[0].lemma_names()
['active_agent', 'active']

You can also access the synset directly with their "name", usual convention for the "name" is the (i) first lemma name then dot (ii) the POS tag and dot (ii) the index number.

>>> wn.synsets('active')[0] == wn.synset('active_agent.n.01')
True

Finally, given a list of key-value pairs (i.e. dictionary object), you can feed it into a pandas.DataFrame to convert it into a dataframe.

NLTK and Pandas - adding synsets into a list

Answers (1)

TL;DR

In Long

Related Questions