Kabilesh
Kabilesh

Reputation: 1012

Create bigrams from list of sentences in pandas dataframe

I have a dataframe like this after some preprocessing. I want to create bigrams from each list in the dataframe rows. How I tried is given below. I get an error saying

lambda row: list((map(ngrams(2), row))))
TypeError: ngrams() missing 1 required positional argument: 'n'

What should be ngrams' first parameter here? How should I modify this code?

Also I may be asking questions on my every function. But I am having a hard time understanding the lamda and map functions that I am using. Please explain me how I should apply lamda and map functions on this dataframe in the future?

Dataframe

 [[ive, searching, right, word, thank, breather], [i, promise, wont, take, help, granted, fulfil, promise], [you, wonderful, blessing, time]]                       

 [[free, entry, 2, wkly, comp, win, fa, cup, final, tkts, 21st, may, 2005], [text, fa, 87121, receive, entry, questionstd, txt, ratetcs, apply, 08452810075over18s]]

 [[nah, dont, think, go, usf, life, around, though]]                                                                                                                

 [[even, brother, like, speak, me], [they, treat, like, aid, patent]]                                                                                               

 [[i, date, sunday, will], []]  

What I need

 [(even, brother), (brother,like), (like,speak), (speak,me), (they, treat), (treat,like), (like,aid), (aid,patent)]  

What I tried

 def toBigram(fullCorpus):
    bigram = fullCorpus['lemmatized'].apply(
       lambda row: list((map(ngrams(2), row))))
    return bigram

Upvotes: 0

Views: 1798

Answers (1)

DYZ
DYZ

Reputation: 57033

When you call map, the first parameter must be a function name, not a function call. ngrams(2) is a function call. You cannot use ngrams with map directly. Either define a lambda function:

lambda row: list(map(lambda x:ngrams(x,2), row))

Or use list comprehension:

lambda row: [ngrams(x,2) for x in row]

Or use function bigrams, which is also a part of NLTK:

lambda row: list(map(bigrams, row))

Upvotes: 2

Related Questions