Reputation: 237
i am not very used to programming and need some help to solve a problem. I have a .csv with 4 columns and about 5k rows, filled with questions and answers. I want to find word collocations in each cell.
Starting point: Pandas dataframe with 4 columns and about 5k rows. (Id, Title, Body, Body2)
Goal: Dataframe with 7 columns (Id, Title, Title-Collocations, Body, Body_Collocations, Body2, Body2-Collocations) and applied a function on each of its rows.
I have found an example for Bigramm Collocation in the NLTK Documentation.
bigram_measures = nltk.collocations.BigramAssocMeasures()
finder.apply_freq_filter(3)
finder = BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
print (finder.nbest(bigram_measures.pmi, 5))
>>>[('Beer', 'Lahai'), ('Lahai', 'Roi'), ('gray', 'hairs'), ('Most', 'High'), ('ewe', 'lambs')]
I want to adapt this function to my Pandas Dataframe. I am aware of the apply function for Pandas Dataframes, but can't manage to get it work.
This is my test-approach for one of the columns:
df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Body']),axis=1)
but if i print that out for an example row i get
print (df['Body-Collocation'][1])
>>> <nltk.collocations.BigramCollocationFinder object at 0x113c47ef0>
I am not even sure if this is the right way. Can someone point me to the right direction?
Upvotes: 5
Views: 2525
Reputation: 237
Thx, for the answer. I guess the question i asked was not perfectly phrased. But your answer still helped me to find a solution. Sometimes its good to take a short break :-)
If someone is interested in the answer. This worked out for me.
df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Question-Tok']),axis=1)
df['Body-Collocation'] = df['Body-Collocation'].apply(lambda df: df.nbest(bigram_measures.pmi, 3))
Upvotes: 3
Reputation: 42875
If you want to apply BigramCollocationFinder.from_words()
to each value
in the Body
`column, you'd have to do:
df['Body-Collocation'] = df.Body.apply(lambda x: BigramCollocationFinder.from_words(x))
In essence, apply
allows you to loop through the rows
and provide the corresponding value
of the Body
column
to the applied function.
But as suggested in the comments, providing a data sample would make it easier to address your specific case.
Upvotes: 3