Reputation: 39
I ran into some issues where I used a for-loop and if conditions to update a dataframe. They should be very basic python logic but I couldn't find explanations online so I'd like to ask here.
For illustration purposes, let's look at a simple dataframe df:
1 2
0 1 0
1 0 1
2 1 0
3 0 0
4 1 1
I wanted a third column based on values of the first two columns:
Initially I wrote:
for i in range(len(df)):
if df.loc[i,'1']==1 & df.loc[i,'2']==0:
df.loc[i,'3']=1
else:
df.loc[i,'3']=0
But I got this:
1 2 3
0 1 0 0.0
1 0 1 0.0
2 1 0 0.0
3 0 0 1.0
4 1 1 0.0
Then I found that when I added brackets to my conditions it worked:
So instead of if df.loc[i,'1']==1 & df.loc[i,'2']==0:
I used if (df.loc[i,'1']==1) & (df.loc[i,'2']==0):
So why is this the case?
Besides, I was testing whether I would always need the bracket even when I only have one condition:
for i in range(len(df)):
if df.loc[1,'2']==1:
df.loc[1,'4']=0
else:
df.loc[1,'4']=1
Another problem occurred where I have missing values and only the cell df.loc[1,'4'] was updated:
1 2 3 4
0 1 0 1.0 NaN
1 0 1 0.0 0.0
2 1 0 1.0 NaN
3 0 0 0.0 NaN
4 1 1 0.0 NaN
I'm really baffled and this time adding the bracket doesn't change anything. Why is it like this?
In addition to these two problems, is my method of updating cell values wrong generally speaking?
Upvotes: 1
Views: 639
Reputation: 367
if column 1 is equal to 1 and column 2 is equal to 0 then put value 1 in column 3.
df.loc[(df["1"] == 1)&(df["2"] == 0), "3"] = 1
if column 1 is not equal to 1 or column 2 is not equal to 0 then put value 0 in column 3.
df.loc[(df["1"] != 1)|(df["2"] != 0), "3"] = 0
Upvotes: 0
Reputation: 863246
Vectorized solution is convert chained mask by &
for bitwise AND
to integers for mapping True, False
to 1,0
:
df['3'] = ((df['1'] == 1) & (df['2'] == 0)).astype(int)
Your solution working with scalars, so use and
instead &
working with arrays (not recommended):
for i in range(len(df)):
if df.loc[i,'1']==1 and df.loc[i,'2']==0:
df.loc[i,'3']=1
else:
df.loc[i,'3']=0
print (df)
1 2 3
0 1 0 1.0
1 0 1 0.0
2 1 0 1.0
3 0 0 0.0
4 1 1 0.0
Upvotes: 2
Reputation: 261924
Don't use a loop, this is an anti-pattern in pandas, use:
df['3'] = (df['1'].eq(1) & df['2'].eq(0)).astype(int)
df['4'] = df['2'].ne(1).astype(int)
# or, if only 0/1
# df['4'] = 1 - df['2']
Also, using eq
in place of ==
avoids to need to wrap the equality with parentheses to respect operator precedence.
Output:
1 2 3 4
0 1 0 1 1
1 0 1 0 0
2 1 0 1 1
3 0 0 0 1
4 1 1 0 0
Upvotes: 1
Reputation: 6574
You better use np.where
:
import numpy as np
df['3'] = np.where (df['1']==1 & df['2']==0, 1, 0)
Upvotes: 0