Reputation: 735
I'm getting an Out Of Memory error when running a program with an input of ~5M obs. and 153 variables.
When running the same program on ~40,000 obs. - program executes and completes great. When I try to increment number of input obs. to more than 100K - I get an error : Out of memory.
I tried to follow the below tips and uncheck few boxes:
1.Tools ► Options ► Results General ► deselect all Result Formats.
2.Tools ► Options ► Results General ► deselect Open generated data/results automatically.
3.Tools ► Options ► Results General ► deselect Link handcoded ODS results.
4.Tools ► Options ► SAS Programs ► deselect Automatically direct results back to SAS Enterprise Guide.
Still - Not working.
I use Enterprise Guide 5.1 but eventually will insert the working code into a User Written Code transformation on Data Integration Studio 4.6.
Does anyone have a clue how to get this program work?
edited
This is the piece of code causes the error messaage:
data _null_;
set all_include_Stornos ;
IF TREATMENT_IND = 1 AND DDS_ROW_IND NE 1 THEN DO;
CALL EXECUTE ('%STORNO_TKUFA ('||POLICY_RK||');');
CALL EXECUTE ('%UPDATE (STORNO_TKUFA_CUMULATE);');
END;
IF TREATMENT_IND in (4) AND DDS_ROW_IND NE 1 THEN DO;
CALL EXECUTE ('%HAKPAA ('||POLICY_RK||','||POLICY_VERSION||');');
call execute ('%UPDATE(HAKPAA_CUMULATE);');
END;
IF TREATMENT_IND = 5 AND DDS_ROW_IND NE 1 and count_bitul_no <2 or (count_bitul_no >1 and max_bitul_ver = policy_verSion ) THEN DO;
CALL EXECUTE ('%BITUL ('||POLICY_RK||','||POLICY_VERSION||');');
CALL EXECUTE ('%UPDATE(BITUL_CUMULATE);');
END;
IF TREATMENT_IND = 6 AND DDS_ROW_IND NE 1 THEN DO;
CALL EXECUTE ('%LAST_STATE ('||POLICY_RK||','||POLICY_VERSION||');');
CALL EXECUTE ('%UPDATE (LAST_STATE_calc);');
END;
IF DDS_ROW_IND NE 1 and ((PREV_TREATMENT_IND = 4 AND TREATMENT_IND NOT IN (1,2,5)) or treatment_ind = 3) THEN DO;
CALL EXECUTE ('%HAFSHARA ('||POLICY_RK||','||POLICY_VERSION||');');
CALL EXECUTE ('%UPDATE (HAFSHARA_CUMULATE);');
END;
IF TREATMENT_IND = 2 AND POLICY_VERSION - 1 = max_bitul_ver AND DDS_ROW_IND NE 1 THEN DO;
CALL EXECUTE ('%STORNO_BITUL ('||POLICY_RK||','||POLICY_VERSION||');');
CALL EXECUTE ('%UPDATE (STORNO_BITUL_CUMULATE);');
END;
RUN;
and these are 2 macros (shortened) executed when treatment_ind = 5:
%macro BITUL (pol_rk , pol_ver );
proc sql;
create table macro_BITUL
as select * from all_include_Stornos
where policy_rk = &pol_rk
and treatment_ind_5 = &pol_ver
order by policy_rk, policy_version;
quit;
data BITUL_calc;
set macro_BITUL;
BY POLICY_RK;
IF LAST.policy_rk THEN ACT_DAILY_AMT_END_DT = POLICY_VERSION_END_DT;
ELSE ACT_DAILY_AMT_END_DT = NEXT_POLICY_VERSION_START_DT;
VERSION_EXPOSURE_DAYS_NO = ACT_DAILY_AMT_END_DT - ACT_DAILY_AMT_START_DT + 1;
BITUL_DURATION = (POLICY_EXPIRATION_DT_5 - POLICY_VERSION_START_DT + 1) / (POLICY_VERSION_END_DT_5 - POLICY_VERSION_START_DT + 1);
GAINED_NET_PREMIUM_V_AMT = NET_PREMIUM_V_AMT *BITUL_DURATION;
.
.
.
GAINED_NET_COMMISION_B_IB_V_AMT = PRODUCR_B_NET_COMM_IB_V_AMT * BITUL_DURATION;
run;
/**/
PROC SQL;
CREATE TABLE TOTAL_GAINED AS
SELECT POLICY_RK,
SUM(GAINED_NET_PREMIUM_V_AMT) AS TOT_GAINED_NET_PREMIUM_V_AMT,
SUM(GAINED_NET_FEES_V_AMT) AS TOT_GAINED_NET_FEES_V_AMT,
.
.
.
SUM(GAINED_NET_COMMISION_B_IB_V_AMT) AS TOT_GAINED_NET_COMMN_B_IB_V_AMT
FROM BITUL_calc
GROUP BY POLICY_RK;
QUIT;
PROC SQL;
CREATE TABLE BITUL_calc_AND_TOTALS AS
SELECT A.* ,
TOT_GAINED_NET_PREMIUM_V_AMT,
.
.
JOIN TOTAL_GAINED AS B
ON
(A.POLICY_rK = B.POLICY_RK
)order by policy_rk, policy_version;
QUIT;
DATA bitul_CALCULATED;
SET BITUL_calc_AND_TOTALS;
IF TOT_GAINED_NET_PREMIUM_V_AMT = 0 THEN CALCULATED_NET_PREMIUM_V_AMT = 0;
ELSE CALCULATED_NET_PREMIUM_V_AMT = NET_PREMIUM_AMT_5 * GAINED_NET_PREMIUM_V_AMT / TOT_GAINED_NET_PREMIUM_V_AMT;
.
.
ELSE CALC_NET_COMMISION_B_IB_V_AMT = PRODUCR_B_NET_COMM_IB_AMT_5 * GAINED_NET_COMMISION_B_IB_V_AMT / TOT_GAINED_NET_COMMN_B_IB_V_AMT;
.
.
.
.
.
run;
DATA BITUL_CUMULATE;
SET bitul_CALCULATED;
.
.
.
.
.
.
run;
%mend ;
%MACRO UPDATE (TABLE_NAME);
PROC SQL;
DELETE FROM all_include_Stornos
WHERE CATS(POLICY_RK,POLICY_VERSION) IN ( SELECT CATS(POLICY_RK,POLICY_VERSION) FROM &TABLE_NAME);
INSERT INTO all_include_Stornos
select * FROM &TABLE_NAME
;QUIT;
/* PROC SORT DATA=all_include_Stornos OUT=all_include_Stornos; BY POLICY_RK POLICY_VERSION;RUN;*/
%MEND;
Upvotes: 0
Views: 4571
Reputation: 51566
Either add %nrstr()
into your CALL EXECUTE statements.
CALL EXECUTE ('%nrstr(%STORNO_TKUFA) ('||POLICY_RK||');');
CALL EXECUTE ('%nrstr(%UPDATE) (STORNO_TKUFA_CUMULATE);');
Or perhaps better change your data _null_ step to generate a file with the code and %include the file.
filename code temp;
data _null_;
set all_include_Stornos ;
IF TREATMENT_IND = 1 AND DDS_ROW_IND NE 1 THEN DO;
put '%STORNO_TKUFA (' POLICY_RK ');';
put '%UPDATE (STORNO_TKUFA_CUMULATE);';
...
RUN;
%include code ;
Upvotes: 0
Reputation: 1804
First you need to check how much disk space is assigned to your sas session and sorting. The options for these are called MEMSIZE and SORTSIZE
Run this code and you will see the values in the log in bytes:
proc options group=memory option=(memsize sortsize) ; run;
I recommend to add this line as the first line in your code to assign MAX space to sort size: options sortsize=MAX;
and see if this solved the issue.
Otherwise you can change the MEMSIZE to MAX , follow steps in this post.
Upvotes: 0
Reputation: 3963
Based on this: "I know on which data step it throws me out. Thing is, it's a data null calls different macros according to if statements."
Take a look at how Call Execute
works. (http://www2.sas.com/proceedings/sugi22/CODERS/PAPER70.PDF will be helpful)
Essentially, Call Execute
will parse the macro code immediately, but it queues up the resulting SAS steps until after the current data step finishes. In other words, you are potentially building up millions upon millions of lines of SAS code in memory that SAS is just storing up to be executed when that data _null_;
step finishes. Eventually, this gets so large that SAS just craps out.
How to deal with this? Simply write a macro to break the data _null_;
step into chunks. You can use proc sql
to select the number of records in all_include_Stornos
into a macro variable. Then use a macro loop to run the data step multiple times, using firstobs
and obs
to hit only a portion of the data each time. That way, you can keep the buffer that's built up by call execute
down to a reasonable size. That should prevent you from running out of memory.
Upvotes: 1
Reputation: 1283
Based on the edit and your comment, i suspect that the 'order by' and/or 'proc sort' parts just cannot handle the volume on the system it is running.
50M is a lot of records, but still manageable by most.
But if your data grows to 500M records, it might be too much.
I recall a client I had a while ago where the dataset was 120M rows and (iirc) +-300 variables and we did get similar issues.
Luckily, there are some ways to handle larger amounts of data in a sort.
1. Use the tagsort option
You can add tagsort as an option to the proc sort statement. In that case, SAS only sorts using the columns that are in the by statement and attaches the remaining part of the records afterwards. It is less resource-intensive, but of course, it is slower.
2. Use indexes for by statements
If you need the data to be in a particular order for a by statement later on in a proc/data step, you can create an index on these variables instead of sorting the data. My experience is that this will allow SAS to cope with larger datasets as well. Of course, it is slower, it consumes more space (for the index) and you need to take into account the existence (and destruction) of indexes.
3. Split the dataset, sort the parts and stitch everything together afterwards
A final method (which is probably preferrable to option 2 in many cases) is to:
This list method (and tagsort) are explained here: http://www.nesug.org/Proceedings/nesug12/cc/cc36.pdf
Note that all these methods require you to get rid of any order by and follow up those proc sql pieces with the appropriate sort procedure.
Upvotes: 0