Reputation: 5385
I am trying to create a function that iterates through a pandas dataframe row by row. I want to create a new column based on row values of other columns. My original dataframe could look like this:
df:
A B
0 1 2
1 3 4
2 2 2
Now I want to create a new column filled with the row values of Column A - Column B at each index position, so that the result looks like this:
df:
A B A-B
0 1 2 -1
1 3 4 -1
2 2 2 0
the solution I have works, but only when I do NOT use it in a function:
for index, row in df.iterrows():
print index
df['A-B']=df['A']-df['B']
This gives me the desired output, but when I try to use it as a function, I get an error.
def test(x):
for index, row in df.iterrows():
print index
df['A-B']=df['A']-df['B']
return df
df.apply(test)
ValueError: cannot copy sequence with size 4 to array axis with dimension 3
What am I doing wrong here and how can I get it to work?
Upvotes: 27
Views: 84248
Reputation: 31682
It's because apply
method works for column by default, change axis
to 1 if you'd like through rows:
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
- 0 or ‘index’: apply function to each column
- 1 or ‘columns’: apply function to each row
df.apply(test, axis=1)
EDIT
I thought that you need to do something complex manupulation with each row. If you need just substract columns from each other:
df['A-B'] = df.A - df.B
Upvotes: 17
Reputation: 6276
Like indicated by Anton you should execute the apply function with axis=1
parameter. However it is not necessary to then loop through the rows as you did in the function test, since
the apply
documentation mentions:
Objects passed to functions are Series objects
So you could simplify the function to:
def test(x):
x['A-B']=x['A']-x['B']
return x
and then run:
df.apply(test,axis=1)
Note that in fact you named the parameter of test x
, while not using x
in the function test
at all.
Finally I should comment that you can do column wise operations with pandas (i.e. without for loop) doing simply this:
df['A-B']=df['A']-df['B']
Also see:
Upvotes: 4