Reputation: 3938
This is somewhat complex (well to me at least).
Here is what I have to do: Say that I have the following dataset:
date price volume
02-Sep 40 100
03-Sep 45 200
04-Sep 46 150
05-Sep 43 300
Say that I have a breakpoint where I wish to create an interval in my dataset. For instance, let my breakpoint = 200 volume transaction.
What I want is to create an ID column and record an ID variable =1,2,3,... for every breakpoint = 200. When you sum all the volume per ID, the value must be constant across all ID variables.
So using my example above, my final dataset should look like the following:
date price volume id
02-Sep 40 100 1
03-Sep 45 100 1
03-Sep 45 100 2
04-Sep 46 100 2
04-Sep 46 50 3
05-Sep 43 150 3
05-Sep 43 150 4
(last row can miss some value but that is fine. I will kick out the last id)
As you can see, I had to "decompose" some rows (like the second row for instance, I break the 200 into two 100 volume) in order to have constant value of the sum, 200, of volume across all ID.
Upvotes: 2
Views: 1419
Reputation: 367
If you have a variable which indicates 'Buy' or 'Sell', then you can try this. Let's say this variable is called type and takes the values 'B' or 'S'. One advantage of using this method would be that it is easier to process 'by-groups' if any.
%let bucketsize = 200;
data tmp2;
set tmp;
retain volsumb idb volusums ids;
/* Initialize. */
volusumb = 0; idb = 1; volsums = 0; ids = 1;
/* Store the current total for each type. */
if type = 'B' then volsumb = volsumb + volume;
else if type = 'S' then volsums = volsums + volume;
/* If the total has reached 200, then reset and increment id. */
/* You have not given the algorithm if the volume exceeds 200, for example the first two values are 150 and 75. */
if volsumb = &bucketsize then do; idb = idb + 1; volsumb = 0; end;
if volsums = &bucketsize then do; ids = ids + 1; volsums = 0; end;
drop volsumb volsums;
run;
Upvotes: 1
Reputation: 11755
Looks like you're doing volume bucketing for a flow toxicity VPIN calculation. I think this works:
%let bucketsize = 200;
data buckets(drop=bucket volume rename=(vol=volume));
set tmp;
retain bucket &bucketsize id 1;
do until(volume=0);
vol=min(volume,bucket);
output;
volume=volume-vol;
bucket=bucket-vol;
if bucket=0 then do;
bucket=&bucketsize;
id=id+1;
end;
end;
run;
I tested this with your dataset and it looks right, but I would check carefully several cases to confirm that it works right.
Upvotes: 4