Using conditional if/else logic with pandas dataframe columns

Question

My dataframe called pw2 looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER based off pw1 and pw2.

+-------------------------+-------------+-----------+-------------+
|          Name1          |     pw1     |   Name2   |     pw2     |
+-------------------------+-------------+-----------+-------------+
| Seaking                 | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn              | 0.172510623 | Quagsire  | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy                 | 0.28681284  | NaN       | NaN         |
+-------------------------+-------------+-----------+-------------+

I want to do this conditionally in a function but I'm having some trouble.

if pw1 > pw2, populate with Name1
if pw2 > pw1, populate with Name2
if pw1 is populated but pw2 isn't, populate with Name1
if pw2 is populated but pw1 isn't, populate with Name2

But my function isn't working - for some reason checking if a value is null isn't working.

def final_winner(df):
    # If PW1 is missing and PW2 is populated, Pokemon 1 wins
    if df['pw1'] = None and df['pw2'] != None:
        return df['Number1']
    # If it's the same thing but the other way around, Pokemon 2 wins
    elif df['pw2'] = None and df['pw1'] != None:
        return df['Number2']
    # If pw2 is greater than pw1, then Pokemon 2 wins
    elif df['pw2'] > df['pw1']:
        return df['Number2']
    else
        return df['Number1']

pw2['Winner'] = pw2.apply(final_winner, axis=1)

rafaelc · Accepted Answer

Do not use apply, which is very slow. Use np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

Once NaNs always lose, can just fillna() it with -np.inf to yield same logic.

Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using == operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpy environment, where there actually several values for null values (None, NaN, NaT, etc).

So, it is preferable to check for nullability using pd.isnull() or df.isnull().

Just to illustrate, this is how your code should look like:

def final_winner(df):
    if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
        return df['Name1']
    elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
        return df['Name1']
    elif df['pw2'] > df['pw1']:
        return df['Name2']
    else:
        return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

But again, definitely use np.where.

Using conditional if/else logic with pandas dataframe columns

Answers (1)

Related Questions