duckman
duckman

Reputation: 747

"BY variables are not properly sorted" error although it was sorted already

I am using SAS for a large dataset (>20gb). When I run a DATA step, I received the "BY variables are not properly sorted ......" although I sorted the dataset by the same variables. When I ran the PROC SORT again, SAS even said "Input dataset is already sorted, No sorting done" My code is:

proc sort data=output.TAQ;
    by market ric date miliseconds descending type order;
run;

options nomprint;

data markers (keep=market ric date miliseconds type order);
    set output.TAQ;
    by market ric date;

    if first.date;

    * ie do the following once per stock-day;
    * Make 1-second markers;

    /*Type="AMARK"; Order=0; * Set order to zero to ensure that markers get placed before trades and quotes that occur at the same milisecond;
    do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;*/
run;

And the error message was:

ERROR: BY variables are not properly sorted on data set OUTPUT.TAQ.
RIC=CXR.CCP Date=20160914 Time=13:47:18.125 Type=Quote Price=. Volume=. BidPrice=9.03 BidSize=400
AskPrice=9.04 AskSize=100 Qualifiers=  order=116458952 Miliseconds=49638125 exchange=CCP market=1
FIRST.market=0 LAST.market=0 FIRST.RIC=0 LAST.RIC=0 FIRST.Date=0 LAST.Date=1 i=. _ERROR_=1
_N_=43297873
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 43297874 observations read from the data set OUTPUT.TAQ.
WARNING: The data set WORK.MARKERS may be incomplete.  When this step was stopped there were
         56770826 observations and 6 variables.
WARNING: Data set WORK.MARKERS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           1:14.21
      cpu time            26.71 seconds

Upvotes: 4

Views: 10078

Answers (3)

pega
pega

Reputation: 23

I had the same error, and the solution was to make a copy of the original table in the work directory, do the sort, and then the "by" was working.

In your case something like below:

data tmp_TAQ;
    set output.TAQ;
run;

proc sort data=tmp_TAQ;
    by market ric date miliseconds descending type order;
run;

data markers (keep=market ric date miliseconds type order);
    set tmp_TAQ;
    by market ric date;

    if first.date;

    * ie do the following once per stock-day;
    * Make 1-second markers;

    /*Type="AMARK"; Order=0; * Set order to zero to ensure that markers get placed before trades and quotes that occur at the same milisecond;
    do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;*/
run;

Upvotes: 0

Snorex
Snorex

Reputation: 914

The error is occurring deep into your data step, at _N_=43297873. That suggests to me that the PROC SORT is working up to a point, but then fails. It is hard to know what the reason is without knowing your SAS environment or how OUTPUT.TAQ is stored.

Some people have reported resource problems or file system limitations when sorting large data sets.

From SAS FAQ: Sorting Very Large Datasets with SAS (not an official source):

  • When sorting in a WORK folder, you must have free storage equal to 4x the size of the data set (or 5x if under Unix)

  • You may be running out of RAM

  • You may be able to use options MSGLEVEL=i and FULLSTIMER to get a fuller picture

Also using options sastraceloc=saslog; can produce helpful messages.

Maybe instead of sorting it, you could break it up into a few steps, something like:

/* Get your market ~ ric ~ date pairs */
proc sql;
   create table market_ric_date as
   select distinct market, ric, date
   from output.TAQ
   /* Possibly an order by clause here on market, ric, date */
; quit;

data millisecond_stuff;
  set market_ric_date; 
  *Possibly add type/order in this step as well?;
  do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;
run;

/* Possibly a third step here to add type / order if you need to get from original data source */

Upvotes: 1

Allan Bowe
Allan Bowe

Reputation: 12691

If your source dataset is in a database, it may be sorted in a different collation.

Try the following before your sort:

options sortpgm=sas;

Upvotes: 0

Related Questions