Reputation: 1152
I am trying to preprocess a dataset with pandas. I want to use a function with multiple arguments (one from a column of the dataframe, others are variables) which returns several outputs like this:
def preprocess(Series,var1,var2,var3,var4):
return 1,2,3,4
I want to use the native pandas.apply to use this function on one column of my dataframe like this:
import pandas as pd
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df['C'], df['D'], df['E'], df['F'] = df.apply(lambda x: preprocess(x['A'], 1, 2, 3, 4), axis=1)
But the last line gives me the following error:
ValueError: not enough values to unpack (expected 4, got 3)
I understand my last line returns one tuple of 4 values (1,2,3,4)
per line whereas I wanted to get each of these values in the columns C
, D
, etc.
How can I perform this?
Upvotes: 0
Views: 193
Reputation: 150735
You need to re-write your function to return a series, that way, apply
returns a dataframe:
def preprocess(Series,var1,var2,var3,var4):
return pd.Series([1,2,3,4])
Then your code would run and return
A B C D E F
0 4 9 0 1 2 3
1 4 9 0 1 2 3
2 4 9 0 1 2 3
Update: Without rewrite of the function:
processed = df.apply(lambda x: preprocess(x['A'], 1, 2, 3, 4), axis=1)
df['C'], df['D'], df['E'], df['F'] = np.array(processed.to_list()).T
Upvotes: 1