marcio
marcio

Reputation: 696

Pandas - Lambda inside apply to return a row

I was expecting to get whole rows when using lambda function inside a apply in Pandas DataFrame, but it looks I'm getting a "single element".

Look that code:

# Data sample
reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})

print(reviews_2)

mean_price_2 = reviews_2.price.mean() # a value to centering

def remean_points(row):
    row.price = row.price - mean_price_2
    return row

centered_price_2 = reviews_2.apply(remean_points, axis='columns') # returns a DataFrame

print(centered_price_2)

That "apply" returns a DataFrame. That is my expected output!

So, I tried to use a lambda function, doing:

reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)

mean_price_2 = reviews_2.price.mean()

centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!

print(centered_price_2)

But now, "apply" returns a Serie!

I know the apply tries to identify the type.
I was waiting to get a row, but it looks to return a "single element"...

So my question:

p in the lambda function should not be a row?

Interesting:

If I do centered_price_2 = reviews_2.apply(lambda p: p, axis='columns'),
I get a DataFrame...

Yet:

How to use lambda and apply functions and to be sure about output type?!

Upvotes: 1

Views: 1915

Answers (2)

marcio
marcio

Reputation: 696

This question was done in 2020, and now, in 2024, reviewing my open questions I understand Pandas a bit more (just a bit)!

So...

My mistake was here:

mean_price_2 = reviews_2.price.mean()

centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!

I explain:

  1. Like I said in that time, apply tries to identify the used type.
  2. mean_price_2 = reviews_2.price.mean() is a Serie.
  3. So, even p been a whole DataFrame, my lambda function expression centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') also returns a Serie!
  4. Because, p.price - mean_price_2 returns a Serie.

In 2020, I wrongly did think lambda p:... should always return a DataFrame since p is a DataFrame. The lambda returned type commes from the evaluated expression...

One solution to fix my code would be:

reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})

print(reviews_2)

mean_price_2 = reviews_2.price.mean()

# note the next two lines
centered_price_2 = reviews_2 # 'Copy' the DataFrame
centered_price_2.price = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # Only change the desired column!

print(centered_price_2)

Happy 2024!

Upvotes: 0

Danail Petrov
Danail Petrov

Reputation: 1875

It's not very clear what is the exact output expected so I hope this is what you're looking for?

The newcol will have the price - mean price.

>>> reviews_2['newcol'] = reviews_2['price'].apply(lambda x: x - reviews_2.price.mean())

   price   country  points  newcol
0    NaN     Italy      87     NaN
1   15.0  Portugal      87     1.0
2   14.0        US      87     0.0
3   13.0        US      87    -1.0

Upvotes: 1

Related Questions