Max Ghenis
Max Ghenis

Reputation: 15783

Create multiple pandas DataFrame columns from applying a function with multiple returns

I'd like to apply a function with multiple returns to a pandas DataFrame and put the results in separate new columns in that DataFrame.

So given something like this:

import pandas as pd

df = pd.DataFrame(data = {'a': [1, 2, 3], 'b': [4, 5, 6]})

def add_subtract(a, b):
  return (a + b, a - b)

The goal is a single command that calls add_subtract on a and b to create two new columns in df: sum and difference.

I thought something like this might work:

(df['sum'], df['difference']) = df.apply(
    lambda row: add_subtract(row['a'], row['b']), axis=1)

But it yields this error:

----> 9 lambda row: add_subtract(row['a'], row['b']), axis=1)

ValueError: too many values to unpack (expected 2)

EDIT: In addition to the below answers, pandas apply function that returns multiple values to rows in pandas dataframe shows that the function can be modified to return a list or Series, i.e.:

def add_subtract_list(a, b):
  return [a + b, a - b]

df[['sum', 'difference']] = df.apply(
    lambda row: add_subtract_list(row['a'], row['b']), axis=1)

or

def add_subtract_series(a, b):
  return pd.Series((a + b, a - b))

df[['sum', 'difference']] = df.apply(
    lambda row: add_subtract_series(row['a'], row['b']), axis=1)

both work (the latter being equivalent to Wen's accepted answer).

Upvotes: 8

Views: 5073

Answers (2)

Abdou
Abdou

Reputation: 13274

One way to do this would be to use pd.DataFrame.assign as follows:

df.assign(**{k:v for k,v in zip(['sum', 'difference'], add_subtract(df.a, df.b))})

Should yield:

   a  b  difference  sum
0  1  4          -3    5
1  2  5          -3    7
2  3  6          -3    9

Clarifications:

zip is a builtin function that returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For instance, list(zip(['sum', 'difference'], [df.a + df.b], df.a - df.b)) should return [('sum', df.a + df.b), ('difference', df.a - df.b)].

** in front of a dictionary object serves as an operator that unpacks the combination of key and value pairs. In essence, the unpacking could be represented as something like this: sum=df.a + df.b, difference=df.a - df.b.

In sum, when combined, you get something like the following:

df.assign(sum=df.a + df.b, difference=df.a - df.b)

Follow the provided links to both zip and the ** operator in front of a dictionary object to get a better idea of how these useful tools work beyond this particular example.

Upvotes: 3

BENY
BENY

Reputation: 323226

Adding pd.Series

df[['sum', 'difference']] = df.apply(
    lambda row: pd.Series(add_subtract(row['a'], row['b'])), axis=1)
df

yields

   a  b  sum  difference
0  1  4    5          -3
1  2  5    7          -3
2  3  6    9          -3

Upvotes: 9

Related Questions