Reputation: 3576
Background:
I have multiple old projects that I need to standardise (PRJ01
-PRJ10
). Each is stored under its own libname and each has around 30 datasets (note: not all studies have the same 30 datasets).
The variable names have remained consistent across the projects. However, over the years, the labels and formats that have been assigned to these variable names have changed in places - for example as below:
Attribute inconsistencies BETWEEN studies:
data PRJ01.users(label='user identifiers') ; attrib USERID label='Username' format=$20. ; run ; data PRJ02.users(label='User Identifiers') ; attrib USERID label='Name of user' format=$15. ; run;
Attribute inconsistencies WITHIN studies:
data PRJ02.users(label='User Identifiers') ; attrib USERID label='Name of user' format=$15. ; run; data PRJ02.orders(label='Orders') ; attrib USERID label='Name of User' format=$15.) ORDERNO label='Order number' format=8. ; run ;
I have written a program to report all inconsistencies. However, I need to generate 'tidy' copies of all projects giving them all a standardised structure. My current thinking is that I should create a dataset of standard variables as below that we can add and adjust until we have everything defined in there:
data standards ;
attrib USERID label='Username ' format=$20.
ORDERNO label='Order Number ' format=8.
;run ;
Question:
From this standards
dataset, what is the best way to apply the attributes to where ever these variables exist?
I will write the output datasets to new libnames eg: PRJ01.users
--> PRJSTD01.users
and put errors to the log if there are any variables that are changed where the variable length is getting truncated.
Upvotes: 2
Views: 70
Reputation: 2762
Create a dictionary table containing your standards:
name label format
USERID Username $20.
ORDERNO Order Number 8.
Join to dictionary table containing column names in your library:
proc sql;
create table standards2 as
select
d.memname,
s.name,
s.label,
s.format
from
sashelp.vcolumn d
inner join standards s
on d.name = s.name
where
libname eq 'PRJ01'
order by
d.memname,
s.name
;
quit;
To get this:
memname name label format
users USERID Username $20.
orders USERID Username $20.
orders ORDERNO Order Number 8.
Then read this data set using put
statements to create a proc datasets
that performs your modifications.
filename gencode temp;
data _null_;
set standards2 end=eof;
by memname;
file gencode;
if _n_ = 1 then put "proc datasets lib=PRJ01 nolist;";
if first.memname then put " modify " memname ";";
put " label " name "='" label "';";
put " format " name format ";";
if eof then put "quit;";
run;
%include gencode / source2;
filename gencode clear;
(stolen from this paper)
You should be able to modify to match the rest of your requirements (copying to new libraries, iterating over projects).
Upvotes: 2