glpsx
glpsx

Reputation: 679

Conditionally change the values of a Series based on the values of other columns

I'm experiencing/learning Python with a DataFrame having the following structure:

df = pd.DataFrame({"left_color"  : ["red", "green", "blue", "black", "white", ""],
                   "right_color" : ["red", "gray", "", "black", "red", ""],
                    "flag"       : [1, 2, 3, 1, 2, 3]})
print(df)

  left_color right_color  flag
0        red         red     1
1      green        gray     2
2       blue                 3
3      black       black     1
4      white         red     2
5                            3

My goal is to conditionally change the values of the flag Series based on the values of the left_color and right_color columns. Specifically:

Here's my attempt:

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        value = np.nan
    if left_side != right_side:
        value = 0
df["flag"] = df.apply(lambda x: myfunc(x["left_color"], x["right_color"], x["flag"]), axis = 1)
print(df)

  left_color right_color  flag
0        red         red  None
1      green        gray  None
2       blue              None
3      black       black  None
4      white         red  None
5                         None

As you can see, the result I'm getting is not the one I initially described. Instead, I'm getting None values everywhere. Here's my desired result:

  left_color right_color  flag
0        red         red     1
1      green        gray     0
2       blue               NaN
3      black       black     1
4      white         red     0
5                          NaN

I would like to understand what is my mistake and how to fix. Additionally, I would like to see if there is a more Pythonic way to solve this problem which is computationally more effective.

Upvotes: 1

Views: 39

Answers (3)

Michael Gardner
Michael Gardner

Reputation: 1803

You forgot to return the value in your function.

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        return np.nan
    elif left_side != right_side:
        return 0
    else:
        return value

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

You want np.select:

df['flag'] = np.select((df.left_color.eq("")|df.right_color.eq(""),
                        df.left_color.ne(df.right_color)),
                       (np.nan, 0), 
                       default=df.flag)

Output:

  left_color right_color  flag
0        red         red   1.0
1      green        gray   0.0
2       blue               NaN
3      black       black   1.0
4      white         red   0.0
5                          NaN

Upvotes: 1

moys
moys

Reputation: 8033

You can use np.select as below. I thin, this is very likely be faster than a custom function.

df.flag=np.select([df.left_color=='',df.right_color=='', df.right_color!=df.left_color,df.right_color==df.left_color],[np.nan,np.nan,0,1] )

Output

   left_color   right_color flag
0   red              red    1.0
1   green            gray   0.0
2   blue                    NaN
3   black            black  1.0
4   white             red   0.0
5                           NaN

Upvotes: 1

Related Questions