ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19405

how to create multiple columns at once with apply?

Consider this example

import pandas as pd
import numpy as np

df = pd.DataFrame({'var1' : [1,2,3,4],
                   'var2' : ['a','b','c','d']})
df
Out[100]: 
   var1 var2
0     1    a
1     2    b
2     3    c
3     4    d

I have a function that takes var1 as input and returns three values that I want to store into three different variables. The following seems to work correctly

    def myfunc(var):
        return [['small list'], var + 2, ['another list']]
    
    df.var1.apply(lambda x: myfunc(x))
    Out[101]: 
    0    [[small list], 3, [another list]]
    1    [[small list], 4, [another list]]
    2    [[small list], 5, [another list]]
    3    [[small list], 6, [another list]]
    Name: var1, dtype: object

However, when I try to create the corresponding variables I get an error

df[['my small list', 'my numeric', 'other list']]  = df.var1.apply(lambda x: myfunc(x))
ValueError: Must have equal len keys and value when setting with an iterable

What do you think?

I used to use the great zip solution in Return multiple columns from pandas apply() but with the current Pandas 1.2 this solution does not work anymore

Thanks!

Upvotes: 1

Views: 1841

Answers (4)

Ynjxsjmh
Ynjxsjmh

Reputation: 30070

You can use result_type argument of pandas.DataFrame.apply()

df[['my small list', 'my numeric', 'other list']]  = df.apply(lambda x: myfunc(x.var1), axis=1, result_type='expand')
# print(df)

   var1 var2 my small list  my numeric      other list
0     1    a  [small list]           3  [another list]
1     2    b  [small list]           4  [another list]
2     3    c  [small list]           5  [another list]
3     4    d  [small list]           6  [another list]

Upvotes: 0

zwithouta
zwithouta

Reputation: 1591

Returning a series is possible the most readable solution.

def myfunc(var):
    return pd.Series([['small list'], var + 2, ['another list']])

df[['my small list', 'my numeric', 'other list']]  = df.var1.apply(lambda x: myfunc(x))

However, for larger dataframes you should prefer either the zip or the dataframe approach.

import pandas as pd # 1.2.2
import perfplot

def setup(n):
    return pd.DataFrame(dict(
        var1=list(range(n))
    ))
 
def with_series(df):
    def myfunc(var):
        return pd.Series([['small list'], var + 2, ['other list']])
    out = pd.DataFrame()
    out[['small list', 'numeric', 'other list']] = df.var1.apply(lambda x: myfunc(x))

def with_zip(df):
    def myfunc(var):
        return [['small list'], var + 2, ['other list']]
    out = pd.DataFrame()
    out['small list'], out['numeric'], out['other list'] = list(zip(*df.var1.apply(lambda x: myfunc(x))))

def with_dataframe(df):
    def myfunc(var):
        return [['small list'], var + 2, ['other list']]
    out = pd.DataFrame()
    out[['small list', 'numeric', 'other list']] = pd.DataFrame(df.var1.apply(myfunc).to_list())


perfplot.show(
    setup=setup,
    kernels=[
        with_series,
        with_zip,
        with_dataframe,
    ],
    labels=["series", "zip", "df"],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(df)",
    equality_check=None,
)

enter image description here

Upvotes: 5

André C. Andersen
André C. Andersen

Reputation: 9405

The zip method seems to work fine still:

import pandas as pd
import numpy as np

df = pd.DataFrame({'var1' : [1,2,3,4],
                   'var2' : ['a','b','c','d']})

def myfunc(var):
    return [['small list'], var + 2, ['another list']]

df['my small list'], df['my numeric'], df['other list'] = zip(*df.var1.apply(lambda x: myfunc(x)))

notebook

Return multiple columns from pandas apply()

The really odd thing is how the inner lists are being coerced into tuples. From experimenting it seems to matter that the outer type is of type list.

To force the inner lists to stay lists I had to do the following:

df['my small list'], df['my numeric'], df['other list'] = (list(row) for row in zip(*df.var1.apply(lambda x: myfunc(x))))

Upvotes: 3

dm2
dm2

Reputation: 4275

Using method from this stackoverflow question, you just need to split the pandas Series object coming from df.var1.apply(myfunc) into columns.

What I did was:

df[['out1','out2','out3']] = pd.DataFrame(df['var1'].apply(myfunc).to_list())

As you can see, this doesn't overwrite your DataFrame, just assigns the resulting columns to new columns in your DataFrame.

DataFrame after the apply method:

   var1 var2          out1  out2            out3
0     1    a  [small_list]     3  [another_list]
1     2    b  [small_list]     4  [another_list]
2     3    c  [small_list]     5  [another_list]
3     4    d  [small_list]     6  [another_list]

Upvotes: 3

Related Questions