Reputation: 45
I am trying to re-code some SAS code into Python. I have the below SAS code:
DATA DF_FINAL;
RETAIN UEN UEN_NO FEE;
SET DF_ADJ1 DF_ADJ2;
KEEP UEN UEN_NO FEE;
RUN;
I don't understand what RETAIN is needed for and I need the equivalent in Python. I tried running the code without the RETAIN line but get the same output. Please assist.
Thank you
Upvotes: 1
Views: 132
Reputation: 1
Well, what I know is that those variables mentioned in retain statement will be kept same as they are in the parent dataset which means they will not be processed like other variables(and there values) when iteration is occurred or we can simply say that they will skip the buffer and will be directly placed in the output dataset(including formats too). It saves us time and machine resources.
Upvotes: 0
Reputation: 51566
The real purpose of a RETAIN statement is to indicate that the values of a NEW variable that is being calculated in the data step should NOT have its values reset to missing when the data step starts processing the next observation.
In this step the RETAIN's formal purpose has no effect. That is because the data step is not calculating any new variables. The only source of values for variables are the input datasets. And variables sourced from input datasets are already "retained".
So the RETAIN statement's only purpose in that data step is to make sure that UEN and UEN_NO are the first two variables in the datasets. So when you print or look the data those two will appear in columns 1 and 2.
The reason it works is because SAS creates the list of variables in the data step in the order it first sees them.
The reason people use RETAIN instead of some other statement to get this side effect of setting the variable order is that unlike references to variable names in other statements (like an assignment statement) SAS does not force a TYPE on the variable when it sees it in the RETAIN statement. So the type and storage length will be determined by how those variables are defined in the source dataset(s).
Upvotes: 3
Reputation: 11
In SAS, the RETAIN statement is used to initialize and retain the values of variables across iterations of the data step. However, in the code snippet you provided, the RETAIN statement seems unnecessary. It initializes the variables UEN, UEN_NO, and FEE but does not seem to serve a specific purpose, especially since these variables are set using the SET statement later in the data step.
In SAS, the SET statement reads an observation from a dataset and copies the values of variables from that observation to the program data vector (PDV). In your case, the SET statement is reading observations from datasets DF_ADJ1 and DF_ADJ2, but the RETAIN statement is not influencing this process. The RETAIN statement is typically used when you want to carry forward values across iterations. If you're using Pandas try concatenating the two DataFrames along the rows using pd.concat
Upvotes: 1