Reputation: 223
So I have a dataset with trading data that has been merged with block trades, which are essentially just very large trades all happening at once and executed by a big financial institution of some kind.
I am trying to see how the price of the security moves right after the block trade specifically in the minutes before and after a block trade.
The data has a variable called Date_Time which has the date and time and is formatted like this: 08FEB03: 03:20:00 (which would be February 8th 2003 at 3:20:00 AM). It also has a dummy variable for when a trade is classified as block trade, which says "Block" if its a block trade and is missing if its not a block trade.
What I would like to do is find every block trade and create a variable that has the time lapse after and before the block trade. For instance, if a block trade happened at 3:20 AM on February 8th 2010 then I want to look at the price at the trades executed right before (lets say at time 3:16 and time 3:19 if those are the trades right before) and also right after (such as at 3:21 and 3:23 if those are the trades right after). Right now I am only trying to look at 30 minute intervals. Therefore I would like my data to look like this
Date_Time|Block|Price|Time_From_Block_Trade
08FEB08:03:16:00 "." 113.01 -4
08FEB08:03:19:00 "." 113.02 -1
08FEB08:03:20:00 "Block" 113.02 0
08FEB08:03:21:00 "." 113.07 1
08FEB08:03:23:00 "." 113.09 3
I have everything except I don't know how to create the Time_From_Block_Trade variable at all. There are a bunch of block trades in the dataset and I would like to figure out how to create a variable that has the difference in the Date_Time between the block trade and the trades that occurred right before and right after the block trade.
I tried doing something with proc expand but I just can't seem to figure this out. Sorry if this is really easy but I am brand new at SAS.
Upvotes: 1
Views: 281
Reputation: 63424
The fairly straightforward way to approach this might be to split this up into a few pieces.
First, have a dataset of just your block trades. I assume you have some other variables on there that identify the trade specifically, and not just datetime (since trades can be simultaneous down to the microsecond, if I understand correctly)? If not then I assume you know how to get back to the original dataset and/or that you can make a unique ID. Let's say this is called StockID.
data blocks;
set trades;
if block=1;
keep stockID date_time block_id;
rename date_time=block_time;
block_id+1;
run;
Now merge this back to your original dataset.
proc sql;
create table block_trades as
select t.*, b.block_id, t.date_time - b.block_time as time_from_block_trade
from blocks b
left join trades t
on b.stockID=t.stockID
and abs(b.block_time -t.date_time) le 30
order by b.block_id, t.date_time;
quit;
This should now be an easy to analyze dataset with just the useful rows, a block_id to do by-group analysis with (or class), and time from block calculated for you.
This works using this initial dataset, for anyone else replicating this:
data trades;
call streaminit(7);
do stockID=1 to 5;
do date_time = '01JAN2014:10:00:00'dt to '01JAN2014:13:00:00'dt;
volume = 1000*rand('Uniform');
if volume > 999 then block=1;
else block=0;
output;
end;
end;
run;
Upvotes: 1