Reputation: 5488
I am playing with the really nice code @piRSquared has provided and this code can be seen below.
I have added another condition if row[col2] == 4000
and this is only seen once in the additional column I added. As expected this additional code has the function yield only a single row as the condition is only seen once.
My question is how can the code be modified to then yield another row after the move is >= move_size
.
Desired output is two rows. One when row['B'] == 4000
(as the code produces now) and another when a move is seen >= move_size
in Col A
. I see these as a trade entry and exit so it would be nice to have an order id in another dataframe column df['C']
as per desired output shown below.
Code from original post:
#starting python community conventions
import numpy as np
import pandas as pd
# n is number of observations
n = 5000
day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()
s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot()
Generator function with slight modification:
def mover_df(df, col,col2, move_size=10):
ref = None
for i, row in df.iterrows():
#added test condition for new col2 signal column
if row[col2] == 4000:
if ref is None or (abs(ref - row.loc[col]) >= move_size):
yield row
ref = row.loc[col]
Generate data
df = s.to_frame()
df['B'] = range(0,len(df))
moves_df = pd.concat(mover_df(df, 'A','B', 3), axis=1).T
Current output:
A B
2013-02-06 14:30:43.874386317 -50.136432 4000.0
Desired output:
(Values in cols A,B
on the second row would be whatever the code generates,I have just added random values to show the format I'm interested in. Col C
is the trade id and for every two rows this would increment +1)
A B C
2013-02-06 14:30:43.874386317 -50.136432 4000.0 1
2013-02-06 14:30:43.874386317 -47.136432 6000.0 1
I have been tying to code this for hours (doesn't help with the kids running around the house now its the school holidays...) and appreciate any help. Would be fantastic to get input from @piRSquared but appreciate people are busy.
Upvotes: 1
Views: 83
Reputation: 294258
I'd edit the mover_df
like this
note:
I changed 4000
condition to % 1000 == 0
to give a few more samples
def mover_df(df, move_col, look_col, move_size=10):
ref, seen = None, False
for i, row in df.iterrows():
#added test condition for new col2 signal column
look_cond = row[look_col] % 1000 == 0
if look_cond and not seen:
yield row
ref, seen = row.loc[move_col], True
elif seen:
move_cond = (abs(ref - row.loc[move_col]) >= move_size)
if move_cond:
yield row
ref, seen = None, False
df = s.to_frame()
df['B'] = range(0,len(df))
moves_df = pd.concat(mover_df(df, 'A','B', 3), axis=1).T
print(moves_df)
A B
2013-02-06 08:00:03.264481639 0.554390 0.0
2013-02-06 08:04:26.609855185 -2.479520 35.0
2013-02-06 09:38:07.962175581 -15.042391 1000.0
2013-02-06 09:40:50.737806497 -18.385956 1026.0
2013-02-06 11:13:03.018013689 -29.074125 2000.0
2013-02-06 11:14:30.980633575 -32.221009 2019.0
2013-02-06 12:49:41.432845325 -35.048040 3000.0
2013-02-06 12:50:28.098114592 -38.881795 3012.0
2013-02-06 14:27:15.008225195 13.437165 4000.0
2013-02-06 14:27:32.790466500 9.513736 4003.0
caveat
This will continue to look for an exit until it is found or you reach the end of the dataframe even if you reach another potential entry point. Meaning, in my example, I look every 1000 rows and enter. I then look for when the move is greater than 10 and exit. If I do not find a move greater than 10 before the next 1000 row market arrives, I'll ignore that 1000 row marker and continue looking for an exit.
The philosophy was that if I'm in the trade, I have to exit. I don't want to enter into another trade prior to resolving the one I'm still in.
Upvotes: 1
Reputation: 418
I don't have too much experience with generators or Pandas, but does this work? My data has different output due to the random seed so I am not sure.
I changed the generator to include the alternative case given, that the first column row[col2] == 4000
, so calling the generator twice should give both values:
def mover_df(df, col, col2, move_size=10, found=False):
ref = None
for i, row in df.iterrows():
#added test condition for new col2 signal column
if row[col2] == 4000:
if ref is None or (abs(ref - row.loc[col]) >= move_size):
yield row
found = True # flag that we found the first row we want
ref = row.loc[col]
elif found: # if we found the first row, find the second meeting the condition
if ref is None or (abs(ref - row.loc[col]) >= move_size):
yield row
And then you can use it like this:
data_generator = mover_df(df, 'A', 'B', 3)
moves_df = pd.concat([data.next(), data.next()], axis=1).T
Upvotes: 2