mmward
mmward

Reputation: 15

Numpy Select with Multiple Conditions Not Returning Values

I combined three dataframes that have multiple email columns, then renamed the columns for ease of scripting. I'm trying to create a master email column based on conditions. If C is populated, use C. If not, and B is populated, use B. If not, and A is populated, use A. For some reason it's only filling in the master column if C is populated. Even if A is populated, it's returning a null value.

dfs = [df_A, d365, df_C]
from functools import reduce
dfFinal = reduce(lambda left,right: pd.merge(left,right,on='leadId'), dfs)

import numpy as np
emailconditions = [
dfFinal['EmailC'] is not None,
(dfFinal['EmailC'] is None) & (dfFinal['EmailB'] is not None),
(dfFinal['EmailC'] is None) & (dfFinal['EmailB'] is None)]

emailvalues = [
dfFinal['EmailC'],
dfFinal['EmailB'],
dfFinal['EmailA']]

dfFinal['emailFinal'] = np.select(emailconditions, emailvalues)

Upvotes: 1

Views: 3258

Answers (2)

ddejohn
ddejohn

Reputation: 8962

Here's an alternative solution, which grabs the first truthy value out of the columns in whichever priority order you want:

In [3]: df
Out[3]:
      a     b     c
0     x     w  None
1  None     y  None
2     k  None     z

In [4]: order = ("c", "b", "a")

In [5]: df.apply(lambda row: next(row[col] for col in order if row[col]), axis=1)
Out[5]:
0    w
1    y
2    z
dtype: object

If you anticipate having rows where none of the columns have a value, then you'd probably want something like this:

def first_truthy(row, order):
    try:
        return next(row[col] for col in order if row[col])
    except StopIteration:
        return None

Output:

In [7]: df
Out[7]:
      a     b     c
0     x     w  None
1  None     y  None
2     k  None     z
3  None  None  None


In [8]: df.apply(lambda row: first_truthy(row, order), axis=1)
Out[8]:
0       w
1       y
2       z
3    None
dtype: object

Not that this is likely slower than boolean masking, but (in my opinion) easier to reason about and debug, and doesn't require the extra dependency on numpy. If you need performance, @richardec's solution is likely vastly superior, though I have not benchmarked our solutions.

Upvotes: 0

user17242583
user17242583

Reputation:

Try setting emailconditions like so:

emailconditions = [
dfFinal['EmailC'].notna(),
dfFinal['EmailC'].isna() & dfFinal['EmailB'].notna(),
dfFinal['EmailC'].isna() & dfFinal['EmailB'].isna()]

Key point is to use notna() instead of is not None and isna() instead of is None.

Upvotes: 4

Related Questions