Subsetting pandas df based on time ranges

Question

my df looks like this and is rather large:

    contract    time        Open        High        Low         Last
0   CME/TYH2018 2017-09-18  125.687500  125.750000  125.687500  125.750000
1   CME/TYH2018 2017-09-20  125.703125  125.750000  125.234375  125.375000
2   CME/TYH2018 2017-09-22  125.609375  125.609375  125.437500  125.484375
3   CME/TYH2018 2017-09-25  125.687500  125.812500  125.687500  125.765625
4   CME/TYH2018 2017-09-26  125.640625  125.796875  125.562500  125.625000
5   CME/TYH2018 2017-09-27  125.171875  125.218750  125.031250  125.125000
371 CME/TYZ2018 2018-07-12  119.984375  120.062500  119.859375  120.015625
372 CME/TYZ2018 2018-07-13  120.156250  120.234375  120.078125  120.218750
373 CME/TYZ2018 2018-07-16  120.000000  120.031250  119.859375  120.000000
374 CME/TYZ2018 2018-07-17  119.968750  120.046875  119.890625  119.953125
375 CME/TYZ2018 2018-07-18  119.875000  120.062500  119.843750  119.890625

I am looking to slice the data as follows. For every unique contract take a slice like this:

start of data for each contract:

df.loc[df.contract=='CME/TYH2018'].time.max() - datetime.timedelta(days=100)

and discard all other rows.

jezrael · Accepted Answer

Use GroupBy.transform with max for Series with same size like DataFrame, substract timedelta and last filter by boolean indexing:

shifted =  df.groupby('contract')['time'].transform('max') - pd.Timedelta(100, unit='d')
df = df[df['time'] > shifted]

Test with sample data for 3 days:

shifted =  df.groupby('contract')['time'].transform('max') - pd.Timedelta(3, unit='d')
df = df[df['time'] > shifted]
print (df)
        contract       time        Open        High         Low        Last
3    CME/TYH2018 2017-09-25  125.687500  125.812500  125.687500  125.765625
4    CME/TYH2018 2017-09-26  125.640625  125.796875  125.562500  125.625000
5    CME/TYH2018 2017-09-27  125.171875  125.218750  125.031250  125.125000
373  CME/TYZ2018 2018-07-16  120.000000  120.031250  119.859375  120.000000
374  CME/TYZ2018 2018-07-17  119.968750  120.046875  119.890625  119.953125
375  CME/TYZ2018 2018-07-18  119.875000  120.062500  119.843750  119.890625

Subsetting pandas df based on time ranges

Answers (1)

Related Questions