Reputation: 696
I was expecting to get whole rows when using lambda function inside a apply in Pandas DataFrame, but it looks I'm getting a "single element".
Look that code:
# Data sample
reviews_2 = pd.DataFrame({
'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0},
'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'},
'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)
mean_price_2 = reviews_2.price.mean() # a value to centering
def remean_points(row):
row.price = row.price - mean_price_2
return row
centered_price_2 = reviews_2.apply(remean_points, axis='columns') # returns a DataFrame
print(centered_price_2)
That "apply" returns a DataFrame. That is my expected output!
So, I tried to use a lambda function, doing:
reviews_2 = pd.DataFrame({
'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0},
'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'},
'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)
mean_price_2 = reviews_2.price.mean()
centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!
print(centered_price_2)
But now, "apply" returns a Serie!
I know the apply
tries to identify the type.
I was waiting to get a row, but it looks to return a "single element"...
So my question:
p
in the lambda function should not be a row?
Interesting:
If I do
centered_price_2 = reviews_2.apply(lambda p: p, axis='columns')
,
I get a DataFrame...
Yet:
How to use
lambda
andapply
functions and to be sure about output type?!
Upvotes: 1
Views: 1915
Reputation: 696
This question was done in 2020, and now, in 2024, reviewing my open questions I understand Pandas a bit more (just a bit)!
So...
My mistake was here:
mean_price_2 = reviews_2.price.mean()
centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!
I explain:
apply
tries to identify the used type.mean_price_2 = reviews_2.price.mean()
is a Serie
.p
been a whole DataFrame
, my lambda function expression centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns')
also returns a Serie
!p.price - mean_price_2
returns a Serie
.In 2020, I wrongly did think lambda p:...
should always return a DataFrame
since p
is a DataFrame
.
The lambda
returned type commes from the evaluated expression...
One solution to fix my code would be:
reviews_2 = pd.DataFrame({
'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0},
'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'},
'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)
mean_price_2 = reviews_2.price.mean()
# note the next two lines
centered_price_2 = reviews_2 # 'Copy' the DataFrame
centered_price_2.price = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # Only change the desired column!
print(centered_price_2)
Happy 2024!
Upvotes: 0
Reputation: 1875
It's not very clear what is the exact output expected so I hope this is what you're looking for?
The newcol
will have the price
- mean price
.
>>> reviews_2['newcol'] = reviews_2['price'].apply(lambda x: x - reviews_2.price.mean())
price country points newcol
0 NaN Italy 87 NaN
1 15.0 Portugal 87 1.0
2 14.0 US 87 0.0
3 13.0 US 87 -1.0
Upvotes: 1