Reputation: 831
When I run the following code I get an KeyError: ('a', 'occurred at index a'). How can I apply this function, or something similar, over the Dataframe without encountering this issue?
Running python3.6, pandas v0.22.0
import numpy as np
import pandas as pd
def add(a, b):
return a + b
df = pd.DataFrame(np.random.randn(3, 3),
columns = ['a', 'b', 'c'])
df.apply(lambda x: add(x['a'], x['c']))
Upvotes: 6
Views: 1054
Reputation: 1460
you can try this
import numpy as np
import pandas as pd
def add(df):
return df.a + df.b
df = pd.DataFrame(np.random.randn(3, 3),
columns = ['a', 'b', 'c'])
df.apply(add, axis =1)
where of course you can substitute any function that takes as inputs the columns of df.
Upvotes: 0
Reputation: 863531
I think need parameter axis=1
for processes by rows in apply
:
axis: {0 or 'index', 1 or 'columns'}, default 0
0 or index: apply function to each column
1 or columns: apply function to each row
df = df.apply(lambda x: add(x['a'], x['c']), axis=1)
print (df)
0 -0.802652
1 0.145142
2 -1.160743
dtype: float64
Upvotes: 5
Reputation: 81684
You don't even need apply, you can directly add the columns. The output will be a series either way:
df = df['a'] + df['c']
for example:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
df = df['a'] + df['c']
print(df)
# 0 6
# 1 8
# dtype: int64
Upvotes: 0