Reputation: 61
I've been moving all of my datasets into SPDE libraries because I've experienced wonderful performance gains in everything. Everything until running proc transpose. This takes ~60x longer to execute on the SPDE dataset than the same dataset stored in normal v9 library. The data sets is sorted by item_id. It is being read/written to the same library.
Does anyone have an idea why this is the case? Am I missing something important about SPDE and Proc Transpose not playing well together?
SPDE Libary
MPRINT(XMLIMPORT_VANTAGE): proc transpose data = smplus.links_response_mechanism out = smplus.response_mechanism (drop = _NAME_)
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE): by item_id;
MPRINT(XMLIMPORT_VANTAGE): id lookup_code;
MPRINT(XMLIMPORT_VANTAGE): var x;
MPRINT(XMLIMPORT_VANTAGE): run;
NOTE: There were 5866747 observations read from the data set SMPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set SMPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set SMPLUS.RESPONSE_MECHANISM decreased size by 37.98 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 28:27.63
cpu time 28:34.64
V9 Library
MPRINT(XMLIMPORT_VANTAGE): proc transpose data = mplus.links_response_mechanism out = mplus.response_mechanism (drop = _NAME_)
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE): by item_id;
68 The SAS System 02:00 Thursday, August 8, 2013
MPRINT(XMLIMPORT_VANTAGE): id lookup_code;
MPRINT(XMLIMPORT_VANTAGE): var x;
MPRINT(XMLIMPORT_VANTAGE): run;
NOTE: There were 5866747 observations read from the data set MPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set MPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set MPLUS.RESPONSE_MECHANISM decreased size by 27.60 percent.
Compressed is 32271 pages; un-compressed would require 44572 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 28.76 seconds
cpu time 28.79 seconds
Upvotes: 4
Views: 517
Reputation: 1
This issue usually occurs when PROC TRANSPOSE is used with BY-processing on compressed datasets. SAS is forced to read the same block of rows repeatedly decompressing them every time until all the records are fully sorted.
Set Compress=No option and it will work. See the log below, one program has Compress=yes and the other Compress=no, the former was 56 minutes vs .5 seconds.
OPTIONS COMPRESS=YES;
50 **tranpose from spde to spde;
51 proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52 var metric ;
53 by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;
54 run;
NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: Compressing data set SPDELIB.SPDELIB_TO_SPDELIB decreased size by 69.96 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 56:58.54
user cpu time 52:03.65
system cpu time 4:03.00
memory 19028.75k
OS Memory 34208.00k
Timestamp 09/16/2019 06:19:55 PM
Step Count 9 Switch Count 22476
Page Faults 0
Page Reclaims 4056
Page Swaps 0
Voluntary Context Switches 142316
Involuntary Context Switches 5726
Block Input Operations 88
Block Output Operations 569200
OPTIONS COMPRESS=NO;
50 **tranpose from spde to spde;
51 proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52 var metric ;
53 by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;
18 The SAS System 16:04 Monday, September 16, 2019
54 run;
NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 26.73 seconds
user cpu time 14.52 seconds
system cpu time 11.99 seconds
memory 13016.71k
OS Memory 27556.00k
Timestamp 09/16/2019 04:13:06 PM
Step Count 9 Switch Count 24827
Page Faults 0
Page Reclaims 2662
Page Swaps 0
Voluntary Context Switches 162653
Involuntary Context Switches 1678
Block Input Operations 96
Block Output Operations 1510040
Upvotes: 0
Reputation: 12701
There have been known issues with SPDE and proc compare in the past (not multi-threading), at least up to version 4.1. What version are you using? (can be seen in the “!install/logs” folder).
This is definitely something to raise with SAS support, to "speed" things along I would recommend submitting a log with the following options:
proc setinit noalias; run;
proc options; run;
%put _ALL_;
options fullstimer msglevel=i;
Also:
options spdedebug='DA_TRACEIO_OCR CJNL=Trace.txt';
(The CJNL option simply routes the trace message output to a text file)
In the meantime, you may be able to take advantage of some of the following SPD specific options:
http://support.sas.com/kb/11/349.html
Upvotes: 1
Reputation: 63434
Looks to me that there is some issue with PROC TRANSPOSE and SPDE. Here's a simple SSCCE, which has significant differences; not as significant as yours, but to some extent that may be a factor of this being on a desktop with not particularly substantial performance tuning in the first place. Sounds like a call to SAS tech support is in order.
libname spdelib spde 'c:\temp\SPDE Main'
datapath=('c:\temp\SPDE Data' 'd:\temp\SPDE Data')
indexpath=('d:\temp\SPDE Index')
partsize=512;
libname mainlib 'c:\temp\';
data mainlib.bigdata;
do ID = 1 to 1500000;
do _varn=1 to 10;
varname=cats("Var_",_varn);
vardata=ranuni(7);
output;
end;
end;
run;
data spdelib.bigdata;
do ID = 1 to 1500000;
do _varn=1 to 10;
varname=cats("Var_",_varn);
vardata=ranuni(7);
output;
end;
end;
run;
*These data steps take roughly the same amount of time, around 30 seconds each;
proc transpose data=spdelib.bigdata out=spdelib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 3 to 4 minutes, with 1.5 minutes CPU time;
proc transpose data=mainlib.bigdata out=mainlib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 30 to 45 seconds, with 20 seconds CPU time;
Upvotes: 3