Reputation: 751
I am trying to use apply
function to my DataFrame.
The apply use a custom function that returns 2 values and that needs to populate the row of 2 columns on my DataFrame.
I put a simple example below:
df = DataFrame ({'a' : 10})
I wish to create two columns: b and c. b equals 1 if a is above 0. c equals 1 if a is above 0.
def compute_b_c(a):
if a > 0:
return 1, 1
else:
return 0,0
I tried this but it returns key error:
df[['b', 'c']] = df.a.apply(compute_b_c)
Upvotes: 1
Views: 33
Reputation: 925
Use result_type
parameter of pandas.DataFrame.apply. Applicable only if you use apply
function on df
(DataFrame) and not df.a
(Series)
df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)
Upvotes: 0
Reputation: 862511
It is possible with DataFrame
constructor,also 1,1
and 0,0
are like tuples (1,1)
and (0,0)
:
df = pd.DataFrame ({'a' : [10, -1, 9]})
def compute_b_c(a):
if a > 0:
return (1,1)
else:
return (0,0)
df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
print (df)
a b c
0 10 1 1
1 -1 0 0
2 9 1 1
Performance:
#10k rows
df = pd.DataFrame ({'a' : [10, -1, 9] * 10000})
In [79]: %timeit df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
22.6 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [80]: %timeit df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)
5.25 s ± 84.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1