ZacAttack
ZacAttack

Reputation: 1055

What is a better Alternative to the np.where() function when adding data to pandas DataFrame based on multiple criteria

Overview
I am building a function that tests the profitability of my investment strategy. The strategy is simple; if the stock closes down for the day I will bet it up the next day, if it closes up for the day I will bet it down the next day. It is basically a contrarian strategy.

Approach
I am using the np.where (numpy) function to determine weather or not it was a winning or losing trade. If the stock closed down today and then went up the next day it is a "Win", or if it went down a "Loss". If it went up today and then down tomorrow "Win" if it went up today and then up the next day "Loss"

Code

df = pd.DataFrame({'Day':['Mon','Tues','Wed','Thurs','Fri','Mon','Tues','Wed','Thurs','Fri','Mon','Tues','Wed','Thurs','Fri'],
                   'DaysGainOrLoss':[0.02,-0.05,0.01,0.02,-0.01,-0.03,0.03,-0.01,0.01,0.03,-0.03,-0.05,-0.01,0.03,0.02]})

df['NextDaysGainOrLoss'] = df['DaysGainOrLoss'].shift(-1)
df['WinOrLoss'] = np.where(df.NextDaysGainOrLoss > 0,np.where(df.DaysGainOrLoss <=0, "Win","Loss"),np.where(df.DaysGainOrLoss > 0,"Win","Loss"))

print(df)  

Output

      Day  DaysGainOrLoss  NextDaysGainOrLoss WinOrLoss
0     Mon            0.02               -0.05       Win
1    Tues           -0.05                0.01       Win
2     Wed            0.01                0.02      Loss
3   Thurs            0.02               -0.01       Win
4     Fri           -0.01               -0.03      Loss
5     Mon           -0.03                0.03       Win
6    Tues            0.03               -0.01       Win
7     Wed           -0.01                0.01       Win
8   Thurs            0.01                0.03      Loss
9     Fri            0.03               -0.03       Win
10    Mon           -0.03               -0.05      Loss
11   Tues           -0.05               -0.01      Loss
12    Wed           -0.01                0.03      Loss
13  Thurs            0.03                0.02      Loss
14    Fri            0.02                 NaN       Win

Problem
This code works for a binary outcome either it is a win or a loss. The problem becomes what if I add a more complex criteria i.e. lets say now I only want to trade it the DaysGainOrLoss is >= .05 or < = -.05. Now Instead of Win and Loss I now need NoTrade to appear when the criteria is not met. The np.where was fine with the binary outcome but now this getting way to complex. I know there has to be a better way to managed this but I am not for sure what it is.

Expectations

      Day  DaysGainOrLoss  NextDaysGainOrLoss WinOrLoss
0     Mon            0.02               -0.05   NoTrade
1    Tues           -0.05                0.01   Win
2     Wed            0.01                0.02   NoTrade
3   Thurs            0.02               -0.01   NoTrade
4     Fri           -0.01               -0.03   NoTrade
5     Mon           -0.03                0.03   NoTrade
6    Tues            0.03               -0.01   NoTrade
7     Wed           -0.01                0.01   NoTrade
8   Thurs            0.01                0.03   NoTrade
9     Fri            0.03               -0.03   NoTrade
10    Mon           -0.03               -0.05   NoTrade
11   Tues           -0.05               -0.01   Loss
12    Wed           -0.01                0.03   NoTrade
13  Thurs            0.03                0.02   NoTrade
14    Fri            0.02                 NaN   NoTrade

Upvotes: 0

Views: 1606

Answers (1)

Sebastian Wozny
Sebastian Wozny

Reputation: 17506

You can use apply to apply arbitrary tansformations:

def myfunc(x):
    if x > 0.5: 
        return 'Win'
    if x < 0: 
        return 'Loss'
    return 'No Trade'
df['action'] = df[0].apply(myfunc)
print(df)

Output

                            0    action
2011-01-01 00:00:00 -0.324398      Loss
2011-01-01 01:00:00 -0.761585      Loss
2011-01-01 02:00:00  0.057204  No Trade
2011-01-01 03:00:00 -1.162510      Loss
2011-01-01 04:00:00 -0.680896      Loss
2011-01-01 05:00:00 -0.701835      Loss
2011-01-01 06:00:00 -0.431338      Loss
2011-01-01 07:00:00  0.306935  No Trade
2011-01-01 08:00:00 -0.503177      Loss
2011-01-01 09:00:00 -0.507444      Loss
2011-01-01 10:00:00  0.230590  No Trade
2011-01-01 11:00:00 -2.326702      Loss
2011-01-01 12:00:00 -0.034664      Loss
2011-01-01 13:00:00  0.224373  No Trade
2011-01-01 14:00:00 -0.242884      Loss
2011-01-01 15:00:00 -0.134757      Loss
2011-01-01 16:00:00 -1.177362      Loss
2011-01-01 17:00:00  0.931335       Win

You can make myfunc arbitrarily complex.

Upvotes: 1

Related Questions