Neal801
Neal801

Reputation: 307

SAS-Creating Panel by several datasets

Suppose there are ten datasets with same structure: date and price, particularly they have same time period but different price

 date       price
 20140604    5
 20140605    7
 20140607    9

I want to combine them and create a panel dataset. Since there is no name in each datasets, I attempt to add a new variable name into each data and then combine them.

The following codes are used to add name variable into each dataset

%macro name(sourcelib=,from=,going=); 
proc sql noprint;  /*read datasets in a library*/
  create table mytables as
  select *
  from dictionary.tables
  where libname = &sourcelib
  order by memname ;

  select count(memname) 
  into:obs 
  from mytables;

  %let obs=&obs.;

  select memname
  into : memname1-:memname&obs.
  from mytables;
quit;

  %do i=1 %to &obs.;

  data 
  &going.&&memname&i;
  set
  &from.&&memname&i;
 name=&&memname&i;
  run;
%end;
%mend;

So, is this strategy correct? Whether are there a different way to creating a panel data?

Upvotes: 1

Views: 158

Answers (3)

Tom
Tom

Reputation: 51601

There are really two ways to setup repeated measures data. You can use the TALL method that your code will create. That is generally the most flexible. The other would be a wide format with each PRICE being stored in a different variable. That is usually less flexible, but can be easier for some analyses.

You probably do not need to use macro code or even code generation to combine 10 datasets. You might find that it is easier to just type the 10 dataset names than to write complex code to pull the names from metadata. So a data step like this will let you list any number of datasets in the SET statement and use the membername as the value for the new PANEL variable that distinguishes the source dataset.

data want ;
  length dsn $41 panel $32 ;
  set in1.panel1 in1.panela in1.panelb indsname=dsn ;
  panel = scan(dsn,-1,'.') ;
run;

And if your dataset names follow a pattern that can be used as a member list in the SET statement then the code is even easier to write. So you could have a list of names that have a numeric suffix.

  set in1.panel1-in1.panel10 indsname=dsn ;

or perhaps names that all start with a particular prefix.

  set in1.panel:  indsname=dsn ;

If the different panels are for the same dates then perhaps the wide format is easier? You could then merge the datasets by DATE and rename the individual PRICE variables. That is generate a data step that looks like this:

data want ;
   merge in1.panel1 (rename=(price=price1))
         in1.panel2 (rename=(price=price2))
         ...
   ;
   by date;
run;

Or perhaps it would be easier to add a BY statement to the data set that makes the TALL dataset and then transpose it into the WIDE format.

data tall;
  length dsn $41 panel $32 ;
  set in1.panel1 in1.panela in1.panelb indsname=dsn ;
  by date ;
  panel = scan(dsn,-1,'.') ;
run;
proc transpose data=tall out=want ;
  by date;
  id panel;
  var price ;
run;

Upvotes: 2

DomPazz
DomPazz

Reputation: 12465

That is a valid way to achieve what you are looking for.

You are going to need 2 . in between the macros for library.data syntax. The first . is used to concatenate. The second shows up as a ..

I assume you will want to append all of these data sets together. You can add

data &going..want;
set
  %do i=1 %to &obs;
      &from..&&memname&i
  %end;
;
run;

You can combine your loop that adds the names and that data step like this:

data &going..want;
set
  %do i=1 %to &obs;
      &from..&&memname&i (in=d&i)
  %end;
;
%do i=1 %to &obs;
   if d&i then
      name = &&memname&i;
%end;
run;

Upvotes: 2

Peter Flom
Peter Flom

Reputation: 2388

I can't comment on the SQL code but the strategy is correct. Add a name to each data set and then panel on the name with the PANELBY statement.

Upvotes: 2

Related Questions