Martingale
Martingale

Reputation: 523

How to create a simple flag in Python

After getting data of the SPY from Yahoo I create a channel of closing prices using, as you can see below, the Max and Min rolling window. The columns are HC and HL.

I need to create a column (which I have called flag) that shows 1 when the close price is equal to HC and this value continues until close will be equal to HL. At this point the value of Flag will be -1. As you can see it is very simple, the Flag can have just two values; 1 or -1.

The simple formula would be something like:

  1. if Close == HC then flag is 1
  2. if Close == HL then flag is -1
  3. if Close != HC and Close !=HL then flag is equal to the last value saved on the variable flag.

I have tried several things, including the code below, with no luck. The problem with this code is that the 0 value is shown. And I don't know how to make it disappear thru a conditional:

import pandas as pd
import pandas_datareader as dr
import numpy as np
from datetime import date

df = dr.data.get_data_yahoo('SPY',start='01-01-2019',end=date.today())

df['HC'] = df['Close'].rolling(20).max() 
df['LC'] = df['Close'].rolling(20).min() 

df['Flag'] = [1 if (df.loc[ei, 'Close'] == df.loc[ei, 'HC']) else 
              -1 if (df.loc[ei, 'Close'] == df.loc[ei, 'LC']) else   
              0 for ei in df.index]

Below you can see in blue the results of my code and in red the result that I need.

Image

Is there a simple way to do this? I would appreciate if someone could help me with it. Thank you!

Upvotes: 5

Views: 11657

Answers (3)

Tim Mironov
Tim Mironov

Reputation: 859

You can use more built-in functionality of Pandas, specifically fillna method and usage of logical indexing. I've added a portion of code to your original one to create an additional Good_Flag:

import pandas_datareader as dr
from datetime import date

df = dr.data.get_data_yahoo('SPY',start='01-01-2019',end=date.today())

df['HC'] = df['Close'].rolling(20).max() 
df['LC'] = df['Close'].rolling(20).min() 

df['Flag'] = [1 if (df.loc[ei, 'Close'] == df.loc[ei, 'HC']) else 
              -1 if (df.loc[ei, 'Close'] == df.loc[ei, 'LC']) else   
              0 for ei in df.index]

pos_indexes = df.Close == df.HC
neg_indexes = df.Close == df.LC
df.loc[pos_indexes, 'Good_Flag'] = 1
df.loc[neg_indexes, 'Good_Flag'] = -1
df = df.fillna(method='ffill')

Pay attention that I used the fillna method with ffill attribute to specify a "Forward Pass".

Edit: To make thigs clear, the original Flag column is left here on purpose and the calculation of the new Good_Flag column doesn't rely on Flag.

To demonstrate the desirable behaviour:

plt.plot(df.Flag, color='blue')
plt.plot(df.Good_Flag, color='red')
plt.legend(['Flag', 'Good_Flag'])
plt.show()

enter image description here

Upvotes: 2

iDrwish
iDrwish

Reputation: 3103

Although this is already answered, however, the fastest way to compute such result is usually using np.where as follows:

import pandas as pd
import pandas_datareader as dr
import numpy as np
from datetime import date

df = dr.data.get_data_yahoo('SPY',start='01-01-2019',end=date.today())

df['HC'] = df['Close'].rolling(20).max() 
df['LC'] = df['Close'].rolling(20).min() 

There is a nested logic in the following:

  1. Create an empty array
  2. Replace Values with -1 under condition
  3. Replace Values with 1 under condition
df['Flag'] = np.where((df.Close == df.HC), 1, 
         np.where(df.Close == df.LC, -1, np.full(df.Close.count(), np.nan)))
df.Flag.fillna(method='ffill', inplace=True)

In terms of performance:

%%timeit
df['Flag'] = np.where((df.Close == df.HC), 1, 
         np.where(df.Close == df.LC, -1, np.full(df.Close.count(), np.nan)))
df.Flag.fillna(method='ffill', inplace=True)
912 µs ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This is definitely better that loops or nested if conditions.

For example @Tim Mironov answer:

%%timeit
pos_indexes = (df.Close == df.HC)
neg_indexes = (df.Close == df.LC)

df.loc[pos_indexes, 'Good_Flag'] = 1
df.loc[neg_indexes, 'Good_Flag'] = -1

df.fillna(method='ffill', inplace=True)
4.43 ms ± 92 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 4

freerafiki
freerafiki

Reputation: 561

A simple way would be to use a loop, but that's inefficient in terms of time. However, if you don't mind, you can loop through the array

flag01 = 0
for ei in df.index:
    if (df.loc[ei, 'Close'] == df.loc[ei, 'HC']):
        flag01 = 1
    if (df.loc[ei, 'Close'] == df.loc[ei, 'LC'])
        flag01 = -1
    df.loc[ei, 'Flag'] = flag01

Basically you set up zero, whenever you find the condition true, you set to 1 and keep one until the condition is met to become -1, and so on. This is slower than the approach you used, but it's the simplest way to "keep the last value" as you are increasing the value and you know which value you had before.

Upvotes: 0

Related Questions