Reputation: 15783
I'd like to apply a function with multiple returns to a pandas DataFrame
and put the results in separate new columns in that DataFrame
.
So given something like this:
import pandas as pd
df = pd.DataFrame(data = {'a': [1, 2, 3], 'b': [4, 5, 6]})
def add_subtract(a, b):
return (a + b, a - b)
The goal is a single command that calls add_subtract
on a
and b
to create two new columns in df
: sum
and difference
.
I thought something like this might work:
(df['sum'], df['difference']) = df.apply(
lambda row: add_subtract(row['a'], row['b']), axis=1)
But it yields this error:
----> 9 lambda row: add_subtract(row['a'], row['b']), axis=1)
ValueError: too many values to unpack (expected 2)
EDIT: In addition to the below answers, pandas apply function that returns multiple values to rows in pandas dataframe shows that the function can be modified to return a list or Series
, i.e.:
def add_subtract_list(a, b):
return [a + b, a - b]
df[['sum', 'difference']] = df.apply(
lambda row: add_subtract_list(row['a'], row['b']), axis=1)
or
def add_subtract_series(a, b):
return pd.Series((a + b, a - b))
df[['sum', 'difference']] = df.apply(
lambda row: add_subtract_series(row['a'], row['b']), axis=1)
both work (the latter being equivalent to Wen's accepted answer).
Upvotes: 8
Views: 5073
Reputation: 13274
One way to do this would be to use pd.DataFrame.assign
as follows:
df.assign(**{k:v for k,v in zip(['sum', 'difference'], add_subtract(df.a, df.b))})
Should yield:
a b difference sum
0 1 4 -3 5
1 2 5 -3 7
2 3 6 -3 9
zip
is a builtin function that returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For instance, list(zip(['sum', 'difference'], [df.a + df.b], df.a - df.b))
should return [('sum', df.a + df.b), ('difference', df.a - df.b)]
.
**
in front of a dictionary object serves as an operator that unpacks the combination of key
and value
pairs. In essence, the unpacking could be represented as something like this: sum=df.a + df.b, difference=df.a - df.b
.
In sum, when combined, you get something like the following:
df.assign(sum=df.a + df.b, difference=df.a - df.b)
Follow the provided links to both zip
and the **
operator in front of a dictionary object to get a better idea of how these useful tools work beyond this particular example.
Upvotes: 3
Reputation: 323226
Adding pd.Series
df[['sum', 'difference']] = df.apply(
lambda row: pd.Series(add_subtract(row['a'], row['b'])), axis=1)
df
yields
a b sum difference
0 1 4 5 -3
1 2 5 7 -3
2 3 6 9 -3
Upvotes: 9