Reputation: 25997
I have a dataframe like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': range(4), 'b': range(2, 6)})
a b
0 0 2
1 1 3
2 2 4
3 3 5
and I have a function that returns several values. Here I just use a dummy function that returns the minimum and maximum for a certain input iterable:
def return_min_max(x):
return (np.min(x), np.max(x))
Now I want to e.g. add the maximum of each column to each value in the respective column.
So
df.apply(return_min_max)
gives
a (0, 3)
b (2, 5)
and then
df.add(df.apply(return_min_max).apply(lambda x: x[1]))
yields the desired outcome
a b
0 3 7
1 4 8
2 5 9
3 6 10
I am wondering whether there is a more straightforward way that avoids the two chained apply
's.
Just to make sure:
I am NOT interested in a
df.add(df.max())
type solution. I highlighted the dummy_function
to illustrate that this not my actual function but just serves as a minimal example function that has several outputs.
Upvotes: 2
Views: 47
Reputation: 150735
At a second look, your return_min_max
is a column function. So it is not that bad. You can do, e.g:
# create a dataframe for easy access
ret_df = pd.DataFrame(df.apply(return_min_max).to_dict())
# a b
# 0 0 2
# 1 3 5
# add
df.add(ret_df.loc[1], axis=1)
Output:
a b
0 3 7
1 4 8
2 5 9
3 6 10
And numpy broadcast:
df.values[None,:] + ret_df.values[:,None]
gives:
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 3, 7],
[ 4, 8],
[ 5, 9],
[ 6, 10]]], dtype=int64)
Upvotes: 3
Reputation: 59519
DataFrame.max
will returns a Series of the column-wise maximum values. DataFrame.add()
will then add this Series
, aligning on columns.
df.add(df.max())
# a b
#0 3 7
#1 4 8
#2 5 9
#3 6 10
If you're real function is much more complicated, there are a few alternatives.
Keep it as is, use .str
to access the max element.
def return_min_max(x):
return (np.min(x), np.max(x))
df.add(df.apply(return_min_max).str[1])
Consider returning a Series with the index being descriptive about what is returned:
def return_min_max(x):
return pd.Series([np.min(x), np.max(x)], index=['min', 'max'])
df.add(df.apply(return_min_max).loc['max'])
Or if the returns can be separated (in this case max
and min
really don't need to be done in the same function), it's simpler to have them separated:
def return_max(x):
return np.max(x)
df.add(df.apply(return_max))
Upvotes: 2