RustyShackleford
RustyShackleford

Reputation: 3667

How to compare and iterate over certain rows in column while creating output as new column in dataframe?

I am wanting to backtest a trading strategy.

The data I have is OHLC (open,high,low, close) for a financial product, that is formatted into a dataframe with 300 rows (each row is 1 day) like so:

datetime      O   H  L  C
2020-03-24    1   2  3  4
2020-03-23    5   6  7  8
2020-03-22    9   1  2  3 
2020-03-21    9   2  2  3 
2020-03-20    9   3  2  3 
2020-03-19    9   4  2  3 
2020-03-18    9   5  2  3 

What I want to do is, starting on the date closet to current date, in this case row with 2020-03-24:

 1. take the number in column `L` 
 2. compare if the number in column `L` is at any point greater than the values in column `L` for the previous two days. 
 3. Create and fill in new column if value from 1 is greater than value in interation. 
 4. Repeat steps 1, 2, & 3 but take the number in column `L` that was not into included in the iteration. 

Example:

1. Starting on row `2020-03-24`, take value `3`
2. Is `3` at any point greater than `7` or `2` for rows starting with `2020-03-23` and `2020-03-22`? 
3. YES,assign `TRUE` to column `comparison` in df for row starting with `2020-03-24`
4. Repeat, starting on row `2020-03-21`, take value `2` in column `L`
4a. Is `2` at any point greater than values in rows `2020-03-20` or `2020-03-19`? 
4b. NO, assign `FALSE` to column `comparison` in df for row starting with `2020-03-21`. 

New df looks like this:

datetime      O   H  L  C    Comparison
2020-03-24    1   2  3  4      TRUE
2020-03-23    5   6  7  8
2020-03-22    9   1  2  3 
2020-03-21    9   2  2  3      FALSE
2020-03-20    9   3  2  3 
2020-03-19    9   4  2  3 
2020-03-18    9   5  2  3 

The only way I know how to do this is with a FOR loop, but that doesnt work on iterating and comparing only certain subsets like so:

for i in df['L']:
   if df['L'] > 

Upvotes: 0

Views: 70

Answers (1)

above_c_level
above_c_level

Reputation: 3929

You need a combination of rolling() and shift():

df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True, ascending=False)
df['Comparison'] = False
df['Comparison'] = df.loc[:, 'L'] > df.loc[:, 'L'].rolling(window=2).min().shift(-2)

With rolling() you get the minimum of the last two days, shift() moves it to the right row.

Upvotes: 1

Related Questions