labjunky
labjunky

Reputation: 831

Pandas apply function on dataframe over multiple columns

When I run the following code I get an KeyError: ('a', 'occurred at index a'). How can I apply this function, or something similar, over the Dataframe without encountering this issue?

Running python3.6, pandas v0.22.0

import numpy as np
import pandas as pd

def add(a, b):
    return a + b

df = pd.DataFrame(np.random.randn(3, 3), 
                  columns = ['a', 'b', 'c'])

df.apply(lambda x: add(x['a'], x['c']))

Upvotes: 6

Views: 1054

Answers (3)

Duccio Piovani
Duccio Piovani

Reputation: 1460

you can try this

import numpy as np
import pandas as pd

def add(df):
   return df.a + df.b

df = pd.DataFrame(np.random.randn(3, 3), 
                      columns = ['a', 'b', 'c'])

df.apply(add, axis =1)

where of course you can substitute any function that takes as inputs the columns of df.

Upvotes: 0

jezrael
jezrael

Reputation: 863531

I think need parameter axis=1 for processes by rows in apply:

axis: {0 or 'index', 1 or 'columns'}, default 0

0 or index: apply function to each column
1 or columns: apply function to each row

df = df.apply(lambda x: add(x['a'], x['c']), axis=1)
print (df)
0   -0.802652
1    0.145142
2   -1.160743
dtype: float64

Upvotes: 5

DeepSpace
DeepSpace

Reputation: 81684

You don't even need apply, you can directly add the columns. The output will be a series either way:

df = df['a'] + df['c']

for example:

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
df = df['a'] + df['c']
print(df)
#  0    6
#  1    8
#  dtype: int64

Upvotes: 0

Related Questions