Reputation: 19405
Consider this example
import pandas as pd
import numpy as np
df = pd.DataFrame({'var1' : [1,2,3,4],
'var2' : ['a','b','c','d']})
df
Out[100]:
var1 var2
0 1 a
1 2 b
2 3 c
3 4 d
I have a function that takes var1
as input and returns three values that I want to store into three different variables. The following seems to work correctly
def myfunc(var):
return [['small list'], var + 2, ['another list']]
df.var1.apply(lambda x: myfunc(x))
Out[101]:
0 [[small list], 3, [another list]]
1 [[small list], 4, [another list]]
2 [[small list], 5, [another list]]
3 [[small list], 6, [another list]]
Name: var1, dtype: object
However, when I try to create the corresponding variables I get an error
df[['my small list', 'my numeric', 'other list']] = df.var1.apply(lambda x: myfunc(x))
ValueError: Must have equal len keys and value when setting with an iterable
What do you think?
I used to use the great zip
solution in Return multiple columns from pandas apply() but with the current Pandas 1.2
this solution does not work anymore
Thanks!
Upvotes: 1
Views: 1841
Reputation: 30070
You can use result_type
argument of pandas.DataFrame.apply()
df[['my small list', 'my numeric', 'other list']] = df.apply(lambda x: myfunc(x.var1), axis=1, result_type='expand')
# print(df)
var1 var2 my small list my numeric other list
0 1 a [small list] 3 [another list]
1 2 b [small list] 4 [another list]
2 3 c [small list] 5 [another list]
3 4 d [small list] 6 [another list]
Upvotes: 0
Reputation: 1591
Returning a series is possible the most readable solution.
def myfunc(var):
return pd.Series([['small list'], var + 2, ['another list']])
df[['my small list', 'my numeric', 'other list']] = df.var1.apply(lambda x: myfunc(x))
However, for larger dataframes you should prefer either the zip or the dataframe approach.
import pandas as pd # 1.2.2
import perfplot
def setup(n):
return pd.DataFrame(dict(
var1=list(range(n))
))
def with_series(df):
def myfunc(var):
return pd.Series([['small list'], var + 2, ['other list']])
out = pd.DataFrame()
out[['small list', 'numeric', 'other list']] = df.var1.apply(lambda x: myfunc(x))
def with_zip(df):
def myfunc(var):
return [['small list'], var + 2, ['other list']]
out = pd.DataFrame()
out['small list'], out['numeric'], out['other list'] = list(zip(*df.var1.apply(lambda x: myfunc(x))))
def with_dataframe(df):
def myfunc(var):
return [['small list'], var + 2, ['other list']]
out = pd.DataFrame()
out[['small list', 'numeric', 'other list']] = pd.DataFrame(df.var1.apply(myfunc).to_list())
perfplot.show(
setup=setup,
kernels=[
with_series,
with_zip,
with_dataframe,
],
labels=["series", "zip", "df"],
n_range=[2 ** k for k in range(20)],
xlabel="len(df)",
equality_check=None,
)
Upvotes: 5
Reputation: 9405
The zip method seems to work fine still:
import pandas as pd
import numpy as np
df = pd.DataFrame({'var1' : [1,2,3,4],
'var2' : ['a','b','c','d']})
def myfunc(var):
return [['small list'], var + 2, ['another list']]
df['my small list'], df['my numeric'], df['other list'] = zip(*df.var1.apply(lambda x: myfunc(x)))
Return multiple columns from pandas apply()
The really odd thing is how the inner lists are being coerced into tuples. From experimenting it seems to matter that the outer type is of type list
.
To force the inner lists to stay lists I had to do the following:
df['my small list'], df['my numeric'], df['other list'] = (list(row) for row in zip(*df.var1.apply(lambda x: myfunc(x))))
Upvotes: 3
Reputation: 4275
Using method from this stackoverflow question, you just need to split the pandas Series object coming from df.var1.apply(myfunc)
into columns.
What I did was:
df[['out1','out2','out3']] = pd.DataFrame(df['var1'].apply(myfunc).to_list())
As you can see, this doesn't overwrite your DataFrame, just assigns the resulting columns to new columns in your DataFrame.
DataFrame after the apply method:
var1 var2 out1 out2 out3
0 1 a [small_list] 3 [another_list]
1 2 b [small_list] 4 [another_list]
2 3 c [small_list] 5 [another_list]
3 4 d [small_list] 6 [another_list]
Upvotes: 3