Solal
Solal

Reputation: 751

Pandas dataframe apply to multiple column

I am trying to use apply function to my DataFrame. The apply use a custom function that returns 2 values and that needs to populate the row of 2 columns on my DataFrame.

I put a simple example below:

df = DataFrame ({'a' : 10})

I wish to create two columns: b and c. b equals 1 if a is above 0. c equals 1 if a is above 0.

def compute_b_c(a):
   if a > 0:
      return 1, 1
   else:
      return 0,0

I tried this but it returns key error:

df[['b', 'c']] = df.a.apply(compute_b_c)

Upvotes: 1

Views: 33

Answers (2)

nishant
nishant

Reputation: 925

Use result_type parameter of pandas.DataFrame.apply. Applicable only if you use apply function on df(DataFrame) and not df.a(Series)

df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)

Upvotes: 0

jezrael
jezrael

Reputation: 862511

It is possible with DataFrame constructor,also 1,1 and 0,0 are like tuples (1,1) and (0,0):

df = pd.DataFrame ({'a' : [10, -1, 9]})

def compute_b_c(a):
   if a > 0:
      return (1,1)
   else:
      return (0,0)

df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
print (df)
    a  b  c
0  10  1  1
1  -1  0  0
2   9  1  1

Performance:

#10k rows
df = pd.DataFrame ({'a' : [10, -1, 9] * 10000})

In [79]: %timeit df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
22.6 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [80]: %timeit df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)
5.25 s ± 84.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

Related Questions