Reputation: 39

updating cell values with if conditions pandas dataframe

I ran into some issues where I used a for-loop and if conditions to update a dataframe. They should be very basic python logic but I couldn't find explanations online so I'd like to ask here.

For illustration purposes, let's look at a simple dataframe df:

I wanted a third column based on values of the first two columns:

Initially I wrote:

for i in range(len(df)):
    if df.loc[i,'1']==1 & df.loc[i,'2']==0:
        df.loc[i,'3']=1
    else:
        df.loc[i,'3']=0

But I got this:

   1  2    3
0  1  0  0.0
1  0  1  0.0
2  1  0  0.0
3  0  0  1.0
4  1  1  0.0

Then I found that when I added brackets to my conditions it worked: So instead of if df.loc[i,'1']==1 & df.loc[i,'2']==0: I used if (df.loc[i,'1']==1) & (df.loc[i,'2']==0):

So why is this the case?

Besides, I was testing whether I would always need the bracket even when I only have one condition:

for i in range(len(df)):
    if df.loc[1,'2']==1:
        df.loc[1,'4']=0
    else:
        df.loc[1,'4']=1

Another problem occurred where I have missing values and only the cell df.loc[1,'4'] was updated:

    1   2   3   4
0   1   0   1.0 NaN
1   0   1   0.0 0.0
2   1   0   1.0 NaN
3   0   0   0.0 NaN
4   1   1   0.0 NaN

I'm really baffled and this time adding the bracket doesn't change anything. Why is it like this?

In addition to these two problems, is my method of updating cell values wrong generally speaking?

Upvotes: 1

Answers (4)

phœnix

Reputation: 367

if column 1 is equal to 1 and column 2 is equal to 0 then put value 1 in column 3.

df.loc[(df["1"] == 1)&(df["2"] == 0), "3"] = 1

if column 1 is not equal to 1 or column 2 is not equal to 0 then put value 0 in column 3.

df.loc[(df["1"] != 1)|(df["2"] != 0), "3"] = 0

Upvotes: 0

jezrael

Reputation: 863246

Vectorized solution is convert chained mask by & for bitwise AND to integers for mapping True, False to 1,0:

df['3'] = ((df['1'] == 1) & (df['2'] == 0)).astype(int)

Your solution working with scalars, so use and instead & working with arrays (not recommended):

for i in range(len(df)):
    if df.loc[i,'1']==1 and df.loc[i,'2']==0:
        df.loc[i,'3']=1
    else:
        df.loc[i,'3']=0


print (df)
   1  2    3
0  1  0  1.0
1  0  1  0.0
2  1  0  1.0
3  0  0  0.0
4  1  1  0.0

Upvotes: 2

mozway

Reputation: 261924

Don't use a loop, this is an anti-pattern in pandas, use:

df['3'] = (df['1'].eq(1) & df['2'].eq(0)).astype(int)

df['4'] = df['2'].ne(1).astype(int)
# or, if only 0/1
# df['4'] = 1 - df['2']

Also, using eq in place of == avoids to need to wrap the equality with parentheses to respect operator precedence.

Output:

   1  2  3  4
0  1  0  1  1
1  0  1  0  0
2  1  0  1  1
3  0  0  0  1
4  1  1  0  0

Upvotes: 1

gtomer

Reputation: 6574

You better use np.where:

 import numpy as np
 df['3'] = np.where (df['1']==1 & df['2']==0, 1, 0)

Upvotes: 0

updating cell values with if conditions pandas dataframe

Answers (4)

Related Questions