Bendy
Bendy

Reputation: 3576

Standardising dataset attributes across projects

Background:

I have multiple old projects that I need to standardise (PRJ01-PRJ10). Each is stored under its own libname and each has around 30 datasets (note: not all studies have the same 30 datasets).

The variable names have remained consistent across the projects. However, over the years, the labels and formats that have been assigned to these variable names have changed in places - for example as below:

Attribute inconsistencies BETWEEN studies:

data PRJ01.users(label='user identifiers') ;
  attrib USERID label='Username' format=$20. ;
run ;

data PRJ02.users(label='User Identifiers') ;
  attrib USERID label='Name of user' format=$15. ;
run;

Attribute inconsistencies WITHIN studies:

data PRJ02.users(label='User Identifiers') ;
  attrib USERID label='Name of user' format=$15. ;
run; 

data PRJ02.orders(label='Orders') ;
  attrib USERID  label='Name of User' format=$15.) 
         ORDERNO label='Order number' format=8. ;
run ;

I have written a program to report all inconsistencies. However, I need to generate 'tidy' copies of all projects giving them all a standardised structure. My current thinking is that I should create a dataset of standard variables as below that we can add and adjust until we have everything defined in there:

data standards ;
  attrib USERID  label='Username                                ' format=$20. 
         ORDERNO label='Order Number                            ' format=8.
;run ;

Question:

From this standards dataset, what is the best way to apply the attributes to where ever these variables exist?

I will write the output datasets to new libnames eg: PRJ01.users --> PRJSTD01.users and put errors to the log if there are any variables that are changed where the variable length is getting truncated.

Upvotes: 2

Views: 70

Answers (1)

DWal
DWal

Reputation: 2762

Create a dictionary table containing your standards:

name     label         format
USERID   Username      $20.
ORDERNO  Order Number  8.

Join to dictionary table containing column names in your library:

proc sql;
create table standards2 as
select
  d.memname,
  s.name,
  s.label,
  s.format
from
  sashelp.vcolumn d
  inner join standards s
  on d.name = s.name
where
  libname eq 'PRJ01'
order by
  d.memname,
  s.name
;
quit;

To get this:

memname    name     label         format
users      USERID   Username      $20.
orders     USERID   Username      $20.
orders     ORDERNO  Order Number  8.

Then read this data set using put statements to create a proc datasets that performs your modifications.

filename gencode temp;
data _null_;
  set standards2 end=eof;
  by memname;
  file gencode;
  if _n_ = 1 then put       "proc datasets lib=PRJ01 nolist;";
  if first.memname then put "  modify " memname ";";
  put                       "  label " name "='" label "';";
  put                       "  format " name format ";";
  if eof then put           "quit;";
run;
%include gencode / source2;
filename gencode clear;

(stolen from this paper)

You should be able to modify to match the rest of your requirements (copying to new libraries, iterating over projects).

Upvotes: 2

Related Questions