sariii
sariii

Reputation: 2140

how to do nested loop using apply in pandas

I have a data frame like this:

text,                pos
No thank you.        [(No, DT), (thank, NN), (you, PRP)]
They didn't respond  [(They, PRP), (didn't, VBP), (respond, JJ)]

I want o apply a function on pos and save the result in a new column. So the output would look like this:

text,                pos                                           score
No thank you.        [(No, DT), (thank, NN), (you, PRP)]        [[0.0, 0.0, 1.0], [], [0.5, 0.0, 0.45]]
They didn't respond  [(They, PRP), (didn, VBP), (respond, JJ)]  [[0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]

So the function return a list for each tuple in the list (but the implementation of the function is not the point here, for that I just call get_sentiment). I can do it using the nested loop but I didn't like it. I want to do it using a more pythonic and Pandas Dataframe way:

This is what I have tried so far:

df['score'] = df['pos'].apply(lambda k: [get_sentiment(x,y) for j in k for (x,y) in j])

However, it raises this error:

ValueError: too many values to unpack (expected 2)

There is a couple of question in so but the answers was in R.

for more clarity:

get_sentiment function is a function in NLTK that assigns a list of score to each word (The list is [positive score, negative score, objectivity score]). Overall, I need to apply that function on top of the pos column of my Dataframe.

Upvotes: 1

Views: 4931

Answers (2)

Karl Knechtel
Karl Knechtel

Reputation: 61509

Let's take Pandas out of the equation and create a minimal reproducible example of the problem - which is to do with the lambda itself:

def mock_sentiment(word, pos):
    return len(word) * 0.1, 0, len(pos) * 0.1

data = [('No', 'DT'), ('thank', 'NN'), ('you', 'PRP')]

[mock_sentiment(x, y) for j in data for (x,y) in j] # reproduces the error

The problem is that each j in data (e.g. ('No', 'DT')) is a single tuple that we want to unpack into x, y values. By iterating in j, we get individual strings ('No' and 'DT') which we then attempt to unpack into x and y. This happens to work for 'No' and 'DT', but not for strings of other lengths - and even then, it's not the desired result.

Since j is already the tuple that we want to unpack, what we want to do is unpack it there, by using (x, y) rather than j for the iteration, and not have any nested comprehension:

[mock_sentiment(x, y) for (x, y) in data] # works as expected

Consequently, that is what we want the lambda to give back to Pandas in the real code (substituting back in your names and the real sentiment function):

df['score'] = df['pos'].apply(lambda k: [get_sentiment(x, y) for (x, y) in k])

Upvotes: 2

BENY
BENY

Reputation: 323226

In your case

df['score'] = df['pos'].apply(lambda k: [get_sentiment(j[0],j[1]) for j in k ])

Upvotes: 2

Related Questions