Plug4
Plug4

Reputation: 3938

SAS creating a dynamic interval

This is somewhat complex (well to me at least).

Here is what I have to do: Say that I have the following dataset:

date    price   volume
02-Sep  40  100
03-Sep  45  200
04-Sep  46  150
05-Sep  43  300

Say that I have a breakpoint where I wish to create an interval in my dataset. For instance, let my breakpoint = 200 volume transaction.

What I want is to create an ID column and record an ID variable =1,2,3,... for every breakpoint = 200. When you sum all the volume per ID, the value must be constant across all ID variables.

So using my example above, my final dataset should look like the following:

date    price   volume  id
02-Sep  40  100 1
03-Sep  45  100 1
03-Sep  45  100 2
04-Sep  46  100 2
04-Sep  46  50  3
05-Sep  43  150 3
05-Sep  43  150 4 

(last row can miss some value but that is fine. I will kick out the last id)

As you can see, I had to "decompose" some rows (like the second row for instance, I break the 200 into two 100 volume) in order to have constant value of the sum, 200, of volume across all ID.

Upvotes: 2

Views: 1419

Answers (2)

Mozan Sykol
Mozan Sykol

Reputation: 367

If you have a variable which indicates 'Buy' or 'Sell', then you can try this. Let's say this variable is called type and takes the values 'B' or 'S'. One advantage of using this method would be that it is easier to process 'by-groups' if any.

%let bucketsize = 200;

data tmp2;
  set tmp;
  retain volsumb idb volusums ids;

  /* Initialize. */
  volusumb = 0; idb = 1; volsums = 0; ids = 1;

  /* Store the current total for each type. */
  if type = 'B' then volsumb = volsumb + volume;
  else if type = 'S' then volsums = volsums + volume;

  /* If the total has reached 200, then reset and increment id. */
  /* You have not given the algorithm if the volume exceeds 200, for example the first two values are 150 and 75. */
  if volsumb = &bucketsize then do; idb = idb + 1; volsumb = 0; end;
  if volsums = &bucketsize then do; ids = ids + 1; volsums = 0; end;

  drop volsumb volsums;
run;

Upvotes: 1

itzy
itzy

Reputation: 11755

Looks like you're doing volume bucketing for a flow toxicity VPIN calculation. I think this works:

%let bucketsize = 200;

data buckets(drop=bucket volume rename=(vol=volume));
    set tmp;
    retain bucket &bucketsize id 1;

    do until(volume=0);
        vol=min(volume,bucket);
        output;
        volume=volume-vol;
        bucket=bucket-vol;
        if bucket=0 then do;
            bucket=&bucketsize;
            id=id+1;
        end;
    end;
run;

I tested this with your dataset and it looks right, but I would check carefully several cases to confirm that it works right.

Upvotes: 4

Related Questions