slm
slm

Reputation: 237

How to apply a function ( BigramCollocationFinder) to Pandas DataFrame

i am not very used to programming and need some help to solve a problem. I have a .csv with 4 columns and about 5k rows, filled with questions and answers. I want to find word collocations in each cell.

Starting point: Pandas dataframe with 4 columns and about 5k rows. (Id, Title, Body, Body2)

Goal: Dataframe with 7 columns (Id, Title, Title-Collocations, Body, Body_Collocations, Body2, Body2-Collocations) and applied a function on each of its rows.

I have found an example for Bigramm Collocation in the NLTK Documentation.

bigram_measures = nltk.collocations.BigramAssocMeasures()
finder.apply_freq_filter(3)
finder = BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
print (finder.nbest(bigram_measures.pmi, 5))
>>>[('Beer', 'Lahai'), ('Lahai', 'Roi'), ('gray', 'hairs'), ('Most', 'High'), ('ewe', 'lambs')]

I want to adapt this function to my Pandas Dataframe. I am aware of the apply function for Pandas Dataframes, but can't manage to get it work.

This is my test-approach for one of the columns:

df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Body']),axis=1)

but if i print that out for an example row i get

print (df['Body-Collocation'][1])
>>> <nltk.collocations.BigramCollocationFinder object at 0x113c47ef0>

I am not even sure if this is the right way. Can someone point me to the right direction?

Upvotes: 5

Views: 2525

Answers (2)

slm
slm

Reputation: 237

Thx, for the answer. I guess the question i asked was not perfectly phrased. But your answer still helped me to find a solution. Sometimes its good to take a short break :-)

If someone is interested in the answer. This worked out for me.

df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Question-Tok']),axis=1)
df['Body-Collocation'] = df['Body-Collocation'].apply(lambda df: df.nbest(bigram_measures.pmi, 3))

Upvotes: 3

Stefan
Stefan

Reputation: 42875

If you want to apply BigramCollocationFinder.from_words() to each value in the Body `column, you'd have to do:

df['Body-Collocation'] = df.Body.apply(lambda x: BigramCollocationFinder.from_words(x))

In essence, apply allows you to loop through the rows and provide the corresponding value of the Body column to the applied function.

But as suggested in the comments, providing a data sample would make it easier to address your specific case.

Upvotes: 3

Related Questions