How to apply a function ( BigramCollocationFinder) to Pandas DataFrame

Question

i am not very used to programming and need some help to solve a problem. I have a .csv with 4 columns and about 5k rows, filled with questions and answers. I want to find word collocations in each cell.

Starting point: Pandas dataframe with 4 columns and about 5k rows. (Id, Title, Body, Body2)

Goal: Dataframe with 7 columns (Id, Title, Title-Collocations, Body, Body_Collocations, Body2, Body2-Collocations) and applied a function on each of its rows.

I have found an example for Bigramm Collocation in the NLTK Documentation.

bigram_measures = nltk.collocations.BigramAssocMeasures()
finder.apply_freq_filter(3)
finder = BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
print (finder.nbest(bigram_measures.pmi, 5))
>>>[('Beer', 'Lahai'), ('Lahai', 'Roi'), ('gray', 'hairs'), ('Most', 'High'), ('ewe', 'lambs')]

I want to adapt this function to my Pandas Dataframe. I am aware of the apply function for Pandas Dataframes, but can't manage to get it work.

This is my test-approach for one of the columns:

df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Body']),axis=1)

but if i print that out for an example row i get

print (df['Body-Collocation'][1])
>>>

I am not even sure if this is the right way. Can someone point me to the right direction?

Stefan · Accepted Answer

If you want to apply BigramCollocationFinder.from_words() to each value in the Body `column, you'd have to do:

df['Body-Collocation'] = df.Body.apply(lambda x: BigramCollocationFinder.from_words(x))

In essence, apply allows you to loop through the rows and provide the corresponding value of the Body column to the applied function.

But as suggested in the comments, providing a data sample would make it easier to address your specific case.

How to apply a function ( BigramCollocationFinder) to Pandas DataFrame

Answers (2)

Related Questions