nipy
nipy

Reputation: 5488

Path dependent slicing - function code modification

I am playing with the really nice code @piRSquared has provided and this code can be seen below.

I have added another condition if row[col2] == 4000 and this is only seen once in the additional column I added. As expected this additional code has the function yield only a single row as the condition is only seen once.

My question is how can the code be modified to then yield another row after the move is >= move_size.

Desired output is two rows. One when row['B'] == 4000 (as the code produces now) and another when a move is seen >= move_size in Col A. I see these as a trade entry and exit so it would be nice to have an order id in another dataframe column df['C'] as per desired output shown below.

Code from original post:

#starting python community conventions
import numpy as np
import pandas as pd

# n is number of observations
n = 5000

day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()

s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot()

Generator function with slight modification:

def mover_df(df, col,col2, move_size=10):
    ref = None
    for i, row in df.iterrows():
        #added test condition for new col2 signal column
        if row[col2] == 4000:
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row
                ref = row.loc[col]

Generate data

df = s.to_frame()
df['B'] = range(0,len(df))

moves_df = pd.concat(mover_df(df, 'A','B', 3), axis=1).T

Current output:

                                  A         B
2013-02-06 14:30:43.874386317   -50.136432  4000.0

Desired output:

(Values in cols A,B on the second row would be whatever the code generates,I have just added random values to show the format I'm interested in. Col C is the trade id and for every two rows this would increment +1)

                                  A         B       C
2013-02-06 14:30:43.874386317   -50.136432  4000.0  1
2013-02-06 14:30:43.874386317   -47.136432  6000.0  1

I have been tying to code this for hours (doesn't help with the kids running around the house now its the school holidays...) and appreciate any help. Would be fantastic to get input from @piRSquared but appreciate people are busy.

Upvotes: 1

Views: 83

Answers (2)

piRSquared
piRSquared

Reputation: 294258

I'd edit the mover_df like this
note:
I changed 4000 condition to % 1000 == 0 to give a few more samples

def mover_df(df, move_col, look_col, move_size=10):
    ref, seen = None, False
    for i, row in df.iterrows():
        #added test condition for new col2 signal column
        look_cond = row[look_col] % 1000 == 0
        if look_cond and not seen:
            yield row
            ref, seen = row.loc[move_col], True
        elif seen:
            move_cond = (abs(ref - row.loc[move_col]) >= move_size)
            if move_cond:
                yield row
                ref, seen = None, False


df = s.to_frame()
df['B'] = range(0,len(df))

moves_df = pd.concat(mover_df(df, 'A','B', 3), axis=1).T

print(moves_df)

                                       A       B
2013-02-06 08:00:03.264481639   0.554390     0.0
2013-02-06 08:04:26.609855185  -2.479520    35.0
2013-02-06 09:38:07.962175581 -15.042391  1000.0
2013-02-06 09:40:50.737806497 -18.385956  1026.0
2013-02-06 11:13:03.018013689 -29.074125  2000.0
2013-02-06 11:14:30.980633575 -32.221009  2019.0
2013-02-06 12:49:41.432845325 -35.048040  3000.0
2013-02-06 12:50:28.098114592 -38.881795  3012.0
2013-02-06 14:27:15.008225195  13.437165  4000.0
2013-02-06 14:27:32.790466500   9.513736  4003.0

caveat
This will continue to look for an exit until it is found or you reach the end of the dataframe even if you reach another potential entry point. Meaning, in my example, I look every 1000 rows and enter. I then look for when the move is greater than 10 and exit. If I do not find a move greater than 10 before the next 1000 row market arrives, I'll ignore that 1000 row marker and continue looking for an exit.

The philosophy was that if I'm in the trade, I have to exit. I don't want to enter into another trade prior to resolving the one I'm still in.

Upvotes: 1

Lucas Currah
Lucas Currah

Reputation: 418

I don't have too much experience with generators or Pandas, but does this work? My data has different output due to the random seed so I am not sure.

I changed the generator to include the alternative case given, that the first column row[col2] == 4000, so calling the generator twice should give both values:

def mover_df(df, col, col2, move_size=10, found=False):
    ref = None
    for i, row in df.iterrows():
        #added test condition for new col2 signal column
        if row[col2] == 4000:
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row
                found = True   # flag that we found the first row we want
                ref = row.loc[col]
        elif found:  # if we found the first row, find the second meeting the condition
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row

And then you can use it like this:

data_generator = mover_df(df, 'A', 'B', 3)
moves_df = pd.concat([data.next(), data.next()], axis=1).T

Upvotes: 2

Related Questions