SpanishBoy
SpanishBoy

Reputation: 2215

Pythonic way to simplify multiply 2 pandas columns with condition

I am looking for a way how I can simplify below examples:

self.df[TARGET_NAME] = self.df.apply(lambda row: 1 if row['WINNER'] == 1 and row['WINNER_OVER_2_5'] == 1 else 0, axis=1)

like:

self.df[TARGET_NAME] = self.df[(self.df.WINNER == 1)] & self.df[(self.df.WINNER_OVER_2_5 == 1)] # not it's not correct

and more complex as below

df["PROFIT"] = np.where((df[TARGET_NAME] == df["PREDICTED"]) & (df["PREDICTED"] == 0),
                                  df['MATCH_HOME'] * df['HOME_STAKE'],
                                  np.where((dfml[TARGET_NAME] == df["PREDICTED"]) & (df["PREDICTED"] == 1),
                                           df['MATCH_DRAW'] * df['DRAW_STAKE'],
                                           np.where((df[TARGET_NAME] == df["PREDICTED"]) & (df["PREDICTED"] == 2),
                                                    df['MATCH_AWAY'] * df['AWAY_STAKE'],
                                                    -0))).astype(float)

Upvotes: 1

Views: 143

Answers (3)

miraculixx
miraculixx

Reputation: 10349

I am looking for a way how I can simplify below examples:

I guess you're looking for a simpler syntax. How about this:

df['MATCH'] = matches(df, values=(0,1), WINNER=1, WINNER_OVER_2_5=1)

Note that values= is optional and takes any tuple(false-value, true-value), defaulting to (False, True).

To get there it takes a bit of magic. Essentially this builds a truth table by chaining the conditions and transforming the result into the values as specified. It ends up doing the same thing as your lambda, just in a generic way.

def matches(df, values=None, **kwargs):
    values = values or (False, True)
    flt = None
    for var, value in kwargs.iteritems():
        t = (df[var] == value)
        flt = (flt & t) if flt is not None else t
    flt = flt.apply(lambda t : values[t])
    return flt

Upvotes: 1

steboc
steboc

Reputation: 1181

maybe you can try with boolean attribute

df= pd.DataFrame({'a':[1,0,1,0],'b' :[1,1,0,np.nan]})

df['NEW']= ((df['a']==1 ) &  (df['b']==1)).astype(int).fillna(0)

Upvotes: 0

jezrael
jezrael

Reputation: 862761

IIUC you can use isin:

print df
   WINNER  WINNER_OVER_2_5
0       1                0
1       1                1
2       0                2

df['TARGET_NAME'] = np.where((df.WINNER.isin([1]) & df.WINNER_OVER_2_5.isin([1])),1,0)
print df
   WINNER  WINNER_OVER_2_5  TARGET_NAME
0       1                0            0
1       1                1            1
2       0                2            0

EDIT (untested, because no data):

df["PROFIT"] = np.where((df[TARGET_NAME] == df["PREDICTED"]) & (df["PREDICTED"].isin([0])),
                                  df['MATCH_HOME'] * df['HOME_STAKE'],
                                  np.where((dfml[TARGET_NAME] == df["PREDICTED"]) & (df["PREDICTED"].isin([1])),
                                           df['MATCH_DRAW'] * df['DRAW_STAKE'],
                                           np.where((df[TARGET_NAME] == df["PREDICTED"]) & (df["PREDICTED"].isin([2])),
                                                    df['MATCH_AWAY'] * df['AWAY_STAKE'],
                                                    0))).astype(float)

Upvotes: 1

Related Questions