Reputation: 184
I have 65 K records such as below snippet in my dataframe:
Scrip Timestamp1 NSETS NSEPr Buyq1 Buyq2 Buyq3 Buyq4 Buyq5 Sellq1 Sellq2 Sellq3 Sellq4 Sellq5 Sellp1 Sellp2 Sellp3 Sellp4 Sellp5 buyp1 buyp2 buyp3 buyp4 buyp5 ActPr TotalBuyQty TotalSellQty
ALANKIT 2018-01-12 13:02:06 2018-01-12 13:00:50 78.10 759.00 100.00 996.00 1287.00 200 15.00 300.00 100.00 1787.00 5614.00 78.25 78.35 78.40 78.45 78.50 78.10 78.05 78.00 77.80 77.75 78.25 63928 194206
ALANKIT 2018-01-12 13:32:29 2018-01-12 13:22:21 79.50 28.00 100.00 200.00 1288.00 248 50.00 178.00 898.00 100.00 487.00 79.50 79.55 79.60 79.65 79.75 79.30 79.15 79.10 79.05 78.80 79.20 61927 175983
ALANKIT 2018-01-12 13:36:26 2018-01-12 13:34:51 79.20 39.00 3649.00 1287.00 7.00 11 1500.00 1024.00 1000.00 220.00 65.00 79.20 79.25 79.50 79.55 79.60 79.15 79.00 78.85 78.65 78.55 79.00 65503 176990
ALANKIT 2018-01-12 14:32:29 2018-01-12 14:31:23 78.80 810.00 1000.00 1287.00 1342.00 555 58.00 20.00 100.00 10.00 1250.00 78.80 78.85 78.90 78.95 79.00 78.70 78.60 78.55 78.50 78.30 78.70 84405 184759
ALANKIT 2018-01-12 14:12:58 2018-01-12 14:11:22 78.50 1.00 5.00 100.00 25.00 510 2542.00 25.00 95.00 50.00 500.00 78.50 78.55 78.60 78.85 78.90 78.30 78.25 78.20 78.15 78.10 78.85 74505 189866
APEX 2018-03-05 14:14:30 2018-03-05 14:13:23 72.00 51.00 71.00 20.00 150 1.00 1.00 14.00 20.00 1108.00 690.00 690.15 690.80 690.95 691.00 689.60 689.55 689.45 689.15 689.00 0 35535 61963 690.00
APEX 2018-01-31 11:52:11 2018-01-31 11:50:48 100.00 10.00 10.00 15.00 50 50.00 50.00 10.00 16.00 67.00 621.15 621.20 621.40 621.80 621.95 619.50 619.00 618.00 617.00 616.50 0 8083 25609 619.50
APEX 2018-01-31 11:56:14 2018-01-31 11:54:48 38.00 29.00 67.00 174.00 124 53.00 50.00 50.00 16.00 25.00 625.00 625.40 625.45 626.00 626.90 623.95 623.90 623.50 623.45 623.00 0 12587 23399 624.00
APEX 2018-01-18 09:36:03 2018-01-18 09:35:14 38.00 46.00 67.00 226.00 6 5.00 50.00 36.00 20.00 30.00 781.00 781.80 781.85 781.95 782.00 780.20 780.15 780.05 780.00 779.95 782.70 17023 21946 780.75
APEX 2018-01-18 09:44:16 2018-01-18 09:42:15 47.00 50.00 25.00 67.00 2887 25.00 8.00 58.00 5.00 50.00 791.60 791.65 791.95 792.30 792.65 790.20 790.15 790.00 789.05 789.00 791.45 22314 26007 790.05
STRTECH 2018-01-19 14:57:51 2018-01-19 14:56:24 68.50 1.00 5.00 2.00 3 3.00 20.00 3.00 5.00 10.00 2484.95 2485.00 2489.00 2489.90 2490.00 2477.55 2477.50 2477.20 2477.05 2476.70 2480.60 32408 8565 2485.00
STRTECH 2018-01-25 10:50:10 2018-01-25 10:47:46 32.65 1.00 511.00 1.00 12 9.00 5.00 100.00 23.00 20.00 2484.60 2484.70 2484.80 2485.00 2486.00 2480.15 2480.10 2480.00 2475.00 2471.15 2534.60 28306 18002 2484.70
Within the Same Scrip And the Same Date (from the field Timestamp1), I would like to query all the records and return records which Satisfy 2 complex conditions.
These conditions are:
a)The NSEPr value should be at least 3.5 % Higher than the First value of NSEPr for That DAY (Day can be extracted from Timetamp1 here)
b)The Sum of Values for SellQ1 + SellQ2.. (tillSell 5) should be 3 times (or Higher than the Sum of Values for BuyQ1 + BuyQ2.. (tillBuyQ5).
I managed to extract the Date from timestamp1 using df['mydt'] = df.Timestamp1.dt.date..
I tried achieving the above task using for loop with df.iterrows(), i.e. iterating across the Df. This failed due to an endless loop..
I remember the above is achievable using df.groupby['Scrip','mydt'].apply
Or perhaps by using df.groupby['scrip','mydt'].apply(lambda x
However I am not able to find the solution to this. I will really appreciate some help on the above.
TIA.
Upvotes: 1
Views: 119
Reputation: 4521
It would look like:
# get the first values per scrip and day
df_a_first_vals= df.groupby([df['Timestamp1'].dt.date, df['Scrip']]).agg({'NSEPr': 'first'})
# create an indexer for condition b and extract the
# corresponding data with the date stored in a separate
# column
df_b_indexer= df[['Sellq1', 'Sellq2', 'Sellq3', 'Sellq4', 'Sellq5']].sum(axis='columns') >= df[['Buyq1', 'Buyq2', 'Buyq3', 'Buyq4', 'Buyq5']].sum(axis='columns')*3
df_b_data= df[df_b_indexer].copy(deep=True)
df_b_data['Timestamp1_date']= df_b_data['Timestamp1'].dt.date
# merge a and b to apply condition a
df_ab_merged= df_b_data.merge(df_a_first_vals, left_on=['Timestamp1_date', 'Scrip'], right_index=True, suffixes=['', '_first'])
# output the result
df_ab_merged[df_ab_merged['NSEPr']>=df_ab_merged['NSEPr_first']*1.035]
It seems your data does not contain such a record, so I just changed the NSEPr value for (APEX, 2018-01-31T11:52:11)
from 100.00 to 20.00. Then the logic above outputs the second row of that day:
Out[148]:
Scrip Timestamp1 NSETS NSEPr ... TotalBuyQty TotalSellQty Timestamp1_date NSEPr_first
7 APEX 2018-01-31 11:56:14 2018-01-31 11:54:48 38.0 ... 23399 624.0 2018-01-31 20.0
[1 rows x 29 columns]
Btw, if your data is really large and you want to avoid the deep copy above, you could just store the date part of Timestamp1
as a separate column.
Testdata (I just manually changed the second-last record, so it conforms the condition):
raw="""Scrip Timestamp1 NSETS NSEPr Buyq1 Buyq2 Buyq3 Buyq4 Buyq5 Sellq1 Sellq2 Sellq3 Sellq4 Sellq5 Sellp1 Sellp2 Sellp3 Sellp4 Sellp5 buyp1 buyp2 buyp3 buyp4 buyp5 ActPr TotalBuyQty TotalSellQty
ALANKIT 2018-01-12T13:02:06 2018-01-12T13:00:50 78.10 759.00 100.00 996.00 1287.00 200 15.00 300.00 100.00 1787.00 5614.00 78.25 78.35 78.40 78.45 78.50 78.10 78.05 78.00 77.80 77.75 78.25 63928 194206
ALANKIT 2018-01-12T13:32:29 2018-01-12T13:22:21 79.50 28.00 100.00 200.00 1288.00 248 50.00 178.00 898.00 100.00 487.00 79.50 79.55 79.60 79.65 79.75 79.30 79.15 79.10 79.05 78.80 79.20 61927 175983
ALANKIT 2018-01-12T13:36:26 2018-01-12T13:34:51 79.20 39.00 3649.00 1287.00 7.00 11 1500.00 1024.00 1000.00 220.00 65.00 79.20 79.25 79.50 79.55 79.60 79.15 79.00 78.85 78.65 78.55 79.00 65503 176990
ALANKIT 2018-01-12T14:32:29 2018-01-12T14:31:23 78.80 810.00 1000.00 1287.00 1342.00 555 58.00 20.00 100.00 10.00 1250.00 78.80 78.85 78.90 78.95 79.00 78.70 78.60 78.55 78.50 78.30 78.70 84405 184759
ALANKIT 2018-01-12T14:12:58 2018-01-12T14:11:22 78.50 1.00 5.00 100.00 25.00 510 2542.00 25.00 95.00 50.00 500.00 78.50 78.55 78.60 78.85 78.90 78.30 78.25 78.20 78.15 78.10 78.85 74505 189866
APEX 2018-03-05T14:14:30 2018-03-05T14:13:23 72.00 51.00 71.00 20.00 150 1.00 1.00 14.00 20.00 1108.00 690.00 690.15 690.80 690.95 691.00 689.60 689.55 689.45 689.15 689.00 0 35535 61963 690.00
APEX 2018-01-31T11:52:11 2018-01-31T11:50:48 20.00 10.00 10.00 15.00 50 50.00 50.00 10.00 16.00 67.00 621.15 621.20 621.40 621.80 621.95 619.50 619.00 618.00 617.00 616.50 0 8083 25609 619.50
APEX 2018-01-31T11:56:14 2018-01-31T11:54:48 38.00 29.00 67.00 174.00 124 53.00 50.00 50.00 16.00 25.00 625.00 625.40 625.45 626.00 626.90 623.95 623.90 623.50 623.45 623.00 0 12587 23399 624.00
APEX 2018-01-18T09:36:03 2018-01-18T09:35:14 38.00 46.00 67.00 226.00 6 5.00 50.00 36.00 20.00 30.00 781.00 781.80 781.85 781.95 782.00 780.20 780.15 780.05 780.00 779.95 782.70 17023 21946 780.75
APEX 2018-01-18T09:44:16 2018-01-18T09:42:15 47.00 50.00 25.00 67.00 2887 25.00 8.00 58.00 5.00 50.00 791.60 791.65 791.95 792.30 792.65 790.20 790.15 790.00 789.05 789.00 791.45 22314 26007 790.05
STRTECH 2018-01-19T14:57:51 2018-01-19T14:56:24 20.50 1.00 5.00 2.00 3 3.00 20.00 3.00 5.00 10.00 2484.95 2485.00 2489.00 2489.90 2490.00 2477.55 2477.50 2477.20 2477.05 2476.70 2480.60 32408 8565 2485.00
STRTECH 2018-01-19T15:50:10 2018-01-25T10:47:46 32.65 1.00 511.00 1.00 12 9.00 5.00 100.00 23.00 20.00 2484.60 2484.70 2484.80 2485.00 2486.00 2480.15 2480.10 2480.00 2475.00 2471.15 2534.60 28306 18002 2484.70"""
df= pd.read_csv(io.StringIO(raw), sep='\s+', parse_dates=['Timestamp1', 'NSETS'], index_col=None)
Result:
Out[212]:
Scrip Timestamp1 NSETS NSEPr ... TotalBuyQty TotalSellQty Timestamp1_date NSEPr_first
11 STRTECH 2018-01-19 15:50:10 2018-01-25 10:47:46 32.65 ... 18002 2484.7 2018-01-19 20.5
[1 rows x 29 columns]
Upvotes: 1
Reputation: 8033
Check if this works for you First we group by Scrip & TimeStamp1
grouped = df.groupby(['Scrip','Timestamp1'])
Now we take the grouped dataframe & check what rows met our conditions. Rows meeting the price condition can be obtained as follows
price_condition=[]
for g_idx, group in grouped:
for row_idx, row in group.iterrows():
if (row.NSEPr > (group.NSEPr.values[0]*1.035)) :
price_condition.append(row_idx)
else:
pass
df.iloc[price_condition]
Rows meeting the Quantity condition can be obtained as follows (only 2 quantities are used in this code)
quantity_condition=[]
for g_idx, group in grouped:
for row_idx, row in group.iterrows():
if ((row.Sellq1+row.Sellq2) > (3*(row.Buyq1 + row.Buyq2))) :
quantity_condition.append(row_idx)
else:
pass
df.iloc[quantity_condition]
Now the rows meeting both conditions could be obtained as follows
pnq_condition=[]
for g_idx, group in grouped:
for row_idx, row in group.iterrows():
if (((row.Sellq1+row.Sellq2+row.Sellq3+row.Sellq4++row.Sellq5) >
(3*(row.Buyq1 + row.Buyq2+ row.Buyq3+ row.Buyq4+ row.Buyq5)))
and (row.NSEPr > (group.NSEPr.values[0]*1.035))) :
pnq_condition.append(row_idx)
else:
pass
df.iloc[price_condition]
I was able to check the values that meet Price condition & Quantity condition separately. However, in the data you provided, there are no rows that meet both conditions. So, check on your complete data & let's know if this code works for you.
Upvotes: 0