Reputation: 2140
I have a data frame like this:
text, pos
No thank you. [(No, DT), (thank, NN), (you, PRP)]
They didn't respond [(They, PRP), (didn't, VBP), (respond, JJ)]
I want o apply a function on pos
and save the result in a new column. So the output would look like this:
text, pos score
No thank you. [(No, DT), (thank, NN), (you, PRP)] [[0.0, 0.0, 1.0], [], [0.5, 0.0, 0.45]]
They didn't respond [(They, PRP), (didn, VBP), (respond, JJ)] [[0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]
So the function return a list for each tuple in the list (but the implementation of the function is not the point here, for that I just call get_sentiment
).
I can do it using the nested loop but I didn't like it. I want to do it using a more pythonic and Pandas Dataframe way:
This is what I have tried so far:
df['score'] = df['pos'].apply(lambda k: [get_sentiment(x,y) for j in k for (x,y) in j])
However, it raises this error:
ValueError: too many values to unpack (expected 2)
There is a couple of question in so but the answers was in R.
for more clarity:
get_sentiment
function is a function in NLTK
that assigns a list of score to each word (The list is [positive score, negative score, objectivity score]
). Overall, I need to apply that function on top of the pos
column of my Dataframe.
Upvotes: 1
Views: 4931
Reputation: 61509
Let's take Pandas out of the equation and create a minimal reproducible example of the problem - which is to do with the lambda itself:
def mock_sentiment(word, pos):
return len(word) * 0.1, 0, len(pos) * 0.1
data = [('No', 'DT'), ('thank', 'NN'), ('you', 'PRP')]
[mock_sentiment(x, y) for j in data for (x,y) in j] # reproduces the error
The problem is that each j in data
(e.g. ('No', 'DT')
) is a single tuple that we want to unpack into x, y
values. By iterating in j
, we get individual strings ('No'
and 'DT'
) which we then attempt to unpack into x
and y
. This happens to work for 'No'
and 'DT'
, but not for strings of other lengths - and even then, it's not the desired result.
Since j
is already the tuple that we want to unpack, what we want to do is unpack it there, by using (x, y)
rather than j
for the iteration, and not have any nested comprehension:
[mock_sentiment(x, y) for (x, y) in data] # works as expected
Consequently, that is what we want the lambda to give back to Pandas in the real code (substituting back in your names and the real sentiment function):
df['score'] = df['pos'].apply(lambda k: [get_sentiment(x, y) for (x, y) in k])
Upvotes: 2
Reputation: 323226
In your case
df['score'] = df['pos'].apply(lambda k: [get_sentiment(j[0],j[1]) for j in k ])
Upvotes: 2