Reputation: 679
I'm experiencing/learning Python with a DataFrame having the following structure:
df = pd.DataFrame({"left_color" : ["red", "green", "blue", "black", "white", ""],
"right_color" : ["red", "gray", "", "black", "red", ""],
"flag" : [1, 2, 3, 1, 2, 3]})
print(df)
left_color right_color flag
0 red red 1
1 green gray 2
2 blue 3
3 black black 1
4 white red 2
5 3
My goal is to conditionally change the values of the flag
Series based on the values of the left_color
and right_color
columns. Specifically:
left_color
is missing or right_color
is missing, change the flag
value to numpy NaN
;left_color
is different than right_color
, change the flag
value to 0
. Here's my attempt:
def myfunc(left_side, right_side, value):
if (left_side == "") | (right_side == ""):
value = np.nan
if left_side != right_side:
value = 0
df["flag"] = df.apply(lambda x: myfunc(x["left_color"], x["right_color"], x["flag"]), axis = 1)
print(df)
left_color right_color flag
0 red red None
1 green gray None
2 blue None
3 black black None
4 white red None
5 None
As you can see, the result I'm getting is not the one I initially described. Instead, I'm getting None
values everywhere. Here's my desired result:
left_color right_color flag
0 red red 1
1 green gray 0
2 blue NaN
3 black black 1
4 white red 0
5 NaN
I would like to understand what is my mistake and how to fix. Additionally, I would like to see if there is a more Pythonic way to solve this problem which is computationally more effective.
Upvotes: 1
Views: 39
Reputation: 1803
You forgot to return the value in your function.
def myfunc(left_side, right_side, value):
if (left_side == "") | (right_side == ""):
return np.nan
elif left_side != right_side:
return 0
else:
return value
Upvotes: 1
Reputation: 150745
You want np.select
:
df['flag'] = np.select((df.left_color.eq("")|df.right_color.eq(""),
df.left_color.ne(df.right_color)),
(np.nan, 0),
default=df.flag)
Output:
left_color right_color flag
0 red red 1.0
1 green gray 0.0
2 blue NaN
3 black black 1.0
4 white red 0.0
5 NaN
Upvotes: 1
Reputation: 8033
You can use np.select
as below. I thin, this is very likely be faster than a custom function.
df.flag=np.select([df.left_color=='',df.right_color=='', df.right_color!=df.left_color,df.right_color==df.left_color],[np.nan,np.nan,0,1] )
Output
left_color right_color flag
0 red red 1.0
1 green gray 0.0
2 blue NaN
3 black black 1.0
4 white red 0.0
5 NaN
Upvotes: 1