user2665159
user2665159

Reputation: 61

proc transpose using SPDE takes ~60x longer to run than v9 library

I've been moving all of my datasets into SPDE libraries because I've experienced wonderful performance gains in everything. Everything until running proc transpose. This takes ~60x longer to execute on the SPDE dataset than the same dataset stored in normal v9 library. The data sets is sorted by item_id. It is being read/written to the same library.

Does anyone have an idea why this is the case? Am I missing something important about SPDE and Proc Transpose not playing well together?

SPDE Libary

MPRINT(XMLIMPORT_VANTAGE):   proc transpose data = smplus.links_response_mechanism out = smplus.response_mechanism (drop = _NAME_) 
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE):   by item_id;
MPRINT(XMLIMPORT_VANTAGE):   id lookup_code;
MPRINT(XMLIMPORT_VANTAGE):   var x;
MPRINT(XMLIMPORT_VANTAGE):   run;

NOTE: There were 5866747 observations read from the data set SMPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set SMPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set SMPLUS.RESPONSE_MECHANISM decreased size by 37.98 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           28:27.63
      cpu time            28:34.64

V9 Library

MPRINT(XMLIMPORT_VANTAGE):   proc transpose data = mplus.links_response_mechanism out = mplus.response_mechanism (drop = _NAME_) 
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE):   by item_id;
68                                                         The SAS System                             02:00 Thursday, August 8, 2013

MPRINT(XMLIMPORT_VANTAGE):   id lookup_code;
MPRINT(XMLIMPORT_VANTAGE):   var x;
MPRINT(XMLIMPORT_VANTAGE):   run;

NOTE: There were 5866747 observations read from the data set MPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set MPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set MPLUS.RESPONSE_MECHANISM decreased size by 27.60 percent. 
      Compressed is 32271 pages; un-compressed would require 44572 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           28.76 seconds
      cpu time            28.79 seconds

Upvotes: 4

Views: 517

Answers (3)

Freddie Fabila
Freddie Fabila

Reputation: 1

This issue usually occurs when PROC TRANSPOSE is used with BY-processing on compressed datasets. SAS is forced to read the same block of rows repeatedly decompressing them every time until all the records are fully sorted.

Set Compress=No option and it will work. See the log below, one program has Compress=yes and the other Compress=no, the former was 56 minutes vs .5 seconds.

OPTIONS COMPRESS=YES;

50         **tranpose from spde to spde;
51         proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52           var metric ;
53           by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;
54         run;

NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: Compressing data set SPDELIB.SPDELIB_TO_SPDELIB decreased size by 69.96 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           56:58.54
      user cpu time       52:03.65
      system cpu time     4:03.00
      memory              19028.75k
      OS Memory           34208.00k
      Timestamp           09/16/2019 06:19:55 PM
      Step Count                        9  Switch Count  22476
      Page Faults                       0
      Page Reclaims                     4056
      Page Swaps                        0
      Voluntary Context Switches        142316
      Involuntary Context Switches      5726
      Block Input Operations            88
      Block Output Operations           569200


OPTIONS COMPRESS=NO;

50         **tranpose from spde to spde;
51         proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52           var metric ;
53           by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;

18                                                         The SAS System                           16:04 Monday, September 16, 2019

54         run;

NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           26.73 seconds
      user cpu time       14.52 seconds
      system cpu time     11.99 seconds
      memory              13016.71k
      OS Memory           27556.00k
      Timestamp           09/16/2019 04:13:06 PM
      Step Count                        9  Switch Count  24827
      Page Faults                       0
      Page Reclaims                     2662
      Page Swaps                        0
      Voluntary Context Switches        162653
      Involuntary Context Switches      1678
      Block Input Operations            96
      Block Output Operations           1510040

Upvotes: 0

Allan Bowe
Allan Bowe

Reputation: 12701

There have been known issues with SPDE and proc compare in the past (not multi-threading), at least up to version 4.1. What version are you using? (can be seen in the “!install/logs” folder).

This is definitely something to raise with SAS support, to "speed" things along I would recommend submitting a log with the following options:

proc setinit noalias; run; 
proc options; run; 
%put _ALL_; 
options fullstimer msglevel=i;

Also:

options spdedebug='DA_TRACEIO_OCR CJNL=Trace.txt';

(The CJNL option simply routes the trace message output to a text file)

In the meantime, you may be able to take advantage of some of the following SPD specific options:

http://support.sas.com/kb/11/349.html

Upvotes: 1

Joe
Joe

Reputation: 63434

Looks to me that there is some issue with PROC TRANSPOSE and SPDE. Here's a simple SSCCE, which has significant differences; not as significant as yours, but to some extent that may be a factor of this being on a desktop with not particularly substantial performance tuning in the first place. Sounds like a call to SAS tech support is in order.

libname spdelib spde 'c:\temp\SPDE Main' 
    datapath=('c:\temp\SPDE Data' 'd:\temp\SPDE Data')
    indexpath=('d:\temp\SPDE Index')
    partsize=512;

libname mainlib 'c:\temp\';


data mainlib.bigdata;
do ID = 1 to 1500000;
  do _varn=1 to 10;
    varname=cats("Var_",_varn);
    vardata=ranuni(7);
    output;
  end;
end;
run;
data  spdelib.bigdata;
do ID = 1 to 1500000;
  do _varn=1 to 10;
    varname=cats("Var_",_varn);
    vardata=ranuni(7);
    output;
  end;
end;
run;
*These data steps take roughly the same amount of time, around 30 seconds each;

proc transpose data=spdelib.bigdata out=spdelib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 3 to 4 minutes, with 1.5 minutes CPU time;

proc transpose data=mainlib.bigdata out=mainlib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 30 to 45 seconds, with 20 seconds CPU time;

Upvotes: 3

Related Questions