Angelo Lapa Huamani
Angelo Lapa Huamani

Reputation: 53

How to accumulate the records of a newly imported table with the records of another table that I have stored on the servers in SAS?

I am new to SAS and I have the following problem: When trying to join records I just imported (in one table) with records I have stored in another table.

What happens is that I am going to run the code in SAS daily, and I need the table that I am going to create today (17/05/2021) by importing a file 'X', to join the table that I created yesterday (16/05/2021) by importing a file 'Y'. And so the code will be executed tomorrow, the next day and so on. In conclusion the records will accumulate as the days go by.

To tackle this problem, I am first creating two variables, one with the date of the day the code will be executed and the other with the date of the last execution.

%let daily_date                = 20210423; /*AAAAMMDD*/
%let last_execution_date = 20210422; /*AAAAMMDD*/

Then the import of a file is done, we can see that the name of this created table has the date of the day in which the code is being executed.

data InputAC.RA_ratings&daily_date;
infile "&ruta_InputRA." FIRSTOBS=2
dsd lrecl=4096 truncover;
input
@1 RA_Customer_ID $10.
@11 Rating_ID 10.
@21 ISRM_Model_Overlay_ID $10.
@31 Constant_ID 10.
@41 Value $100.
;
run;

proc sort data=inputac.RA_ratings&daily_date;
by RA_Customer_ID Rating_ID;
quit;

Finally the union of InputAC.RA_ratings&daily_date with InputAC.RA_ratings&last_execution_date is made. ('InputAC.RA_ratings&last_execution_date' should be the table that was imported at an earlier date than today.)

data InputAC.RA_ratings&fec_diario;
    merge
    InputAC.RA_ratings&fec_diario
    InputAC.RA_ratings&ultima_fecha_de_ejecucion;
    by RA_Customer_ID Rating_ID;
run;

This is how the tables are being stored on the server.

enter image description here

(Ignore date 20210413, let's imagine it is 20210422)

However, I have to perform this task without using the variable 'last_execution_date'.

I've been researching but I still can't find any SAS function that can help me with this problem. I hope someone can help me, thank you very much in advance.

Upvotes: 1

Views: 61

Answers (1)

Joe
Joe

Reputation: 63424

This is a pretty complex and interesting question from an operations point of view. The answer depends on a few things.

  1. How much control do you have over the execution of this process?
  2. Is "yesterday" guaranteed, or does the process need to work if "last execution date" is not yesterday?
  3. What should happen if the process is run twice today?

The best practices way to solve this is to have a dataset (or table) that stores the last execution date. That allows you to handle #2 trivially, and the answer to #3 might guide exactly how you store this but is easily handled anyway.

Say for example you have a table, MetaAC.LastExecDate (or, in spanish, MetaAC.UltimaFecha or similar). It could store things this way:

data LastExecDate;
  timestamp = datetime();
  execdate  = input(&daily_date,yymmdd8.);
run;

proc append base=MetaAC.LastExecDate data=LastExecDate;
run;

This lets you store an arbitrary execdate even if it's not today, and also store when you ran it (for audit purposes), and you could even add who ran it if that's interesting (there is a macro variable &sysuserid or similar). Then put all this at the bottom of your process, and it updates as you go.

Then, you can pull out from this the exact info you want - for example,

proc sql;
  select max(execdate)
    into :last_exec_date
    from MetaAC.LastExecDate
    where execdate ne today()
  ;
quit;

Now, if you don't have control over this for some reason, you could determine this in a different way. Again, the exact process depends on your circumstances and your answers to 2 and 3.

If your answer to 2 is you always want it to be yesterday, then this is really easy - just do this:

%let daily_date=20210517;
%let last_execution_date = %sysfunc(putn(%sysevalf(%sysfunc(inputn(&daily_date,yymmdd8.))-1),yymmddn8.));
%put &=last_execution_date;

The two %sysfuncs just do the input/put from SAS datastep inside the macro language, and %sysevalf lets you do math.

If you don't want it to always be the prior day (if there are weekends, or other days you don't necessarily want to assume it's the prior day), then your best bet is to either use the dictionary tables to look at what's there and find the largest date prior to your date, or maybe use a x command to look at the folder and do the same thing (might be easier to use OS command than to use SQL for this, sometimes SQL dictionary tables can be slow).

Upvotes: 2

Related Questions