Reputation: 1155
My dataframe called pw2
looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER
based off pw1
and pw2
.
+-------------------------+-------------+-----------+-------------+
| Name1 | pw1 | Name2 | pw2 |
+-------------------------+-------------+-----------+-------------+
| Seaking | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn | 0.172510623 | Quagsire | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy | 0.28681284 | NaN | NaN |
+-------------------------+-------------+-----------+-------------+
I want to do this conditionally in a function but I'm having some trouble.
pw1
> pw2
, populate with Name1
pw2
> pw1
, populate with Name2
pw1
is populated but pw2
isn't, populate with Name1
pw2
is populated but pw1
isn't, populate with Name2
But my function isn't working - for some reason checking if a value is null isn't working.
def final_winner(df):
# If PW1 is missing and PW2 is populated, Pokemon 1 wins
if df['pw1'] = None and df['pw2'] != None:
return df['Number1']
# If it's the same thing but the other way around, Pokemon 2 wins
elif df['pw2'] = None and df['pw1'] != None:
return df['Number2']
# If pw2 is greater than pw1, then Pokemon 2 wins
elif df['pw2'] > df['pw1']:
return df['Number2']
else
return df['Number1']
pw2['Winner'] = pw2.apply(final_winner, axis=1)
Upvotes: 5
Views: 15480
Reputation: 59274
Do not use apply
, which is very slow. Use np.where
pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)
Once NaN
s always lose, can just fillna()
it with -np.inf
to yield same logic.
Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None
, which is invalid python syntax for comparison. You usually want to compare things using ==
operator. However, for None
, it is recommended to use is
, such as if variable is None: (...)
. However again, you are in a pandas/numpy
environment, where there actually several values for null values (None
, NaN
, NaT
, etc).
So, it is preferable to check for nullability using pd.isnull()
or df.isnull()
.
Just to illustrate, this is how your code should look like:
def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']
df['winner'] = df.apply(final_winner, axis=1)
But again, definitely use np.where
.
Upvotes: 8