Pandas - Lambda inside apply to return a row

Question

I was expecting to get whole rows when using lambda function inside a apply in Pandas DataFrame, but it looks I'm getting a "single element".

Look that code:

# Data sample
reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})

print(reviews_2)

mean_price_2 = reviews_2.price.mean() # a value to centering

def remean_points(row):
    row.price = row.price - mean_price_2
    return row

centered_price_2 = reviews_2.apply(remean_points, axis='columns') # returns a DataFrame

print(centered_price_2)

That "apply" returns a DataFrame. That is my expected output!

So, I tried to use a lambda function, doing:

reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)

mean_price_2 = reviews_2.price.mean()

centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!

print(centered_price_2)

But now, "apply" returns a Serie!

I know the apply tries to identify the type.
I was waiting to get a row, but it looks to return a "single element"...

So my question:

p in the lambda function should not be a row?

Interesting:

If I do centered_price_2 = reviews_2.apply(lambda p: p, axis='columns'),
I get a DataFrame...

Yet:

How to use lambda and apply functions and to be sure about output type?!

marcio · Accepted Answer

This question was done in 2020, and now, in 2024, reviewing my open questions I understand Pandas a bit more (just a bit)!

So...

My mistake was here:

mean_price_2 = reviews_2.price.mean()

centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!

I explain:

Like I said in that time, apply tries to identify the used type.
mean_price_2 = reviews_2.price.mean() is a Serie.
So, even p been a whole DataFrame, my lambda function expression centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') also returns a Serie!
Because, p.price - mean_price_2 returns a Serie.

In 2020, I wrongly did think lambda p:... ~~should always return~~ a DataFrame since p is a DataFrame. The lambda returned type commes from the evaluated expression...

One solution to fix my code would be:

reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})

print(reviews_2)

mean_price_2 = reviews_2.price.mean()

# note the next two lines
centered_price_2 = reviews_2 # 'Copy' the DataFrame
centered_price_2.price = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # Only change the desired column!

print(centered_price_2)

Happy 2024!

Pandas - Lambda inside apply to return a row

Answers (2)

Related Questions