user3013203
user3013203

Reputation: 49

SAS transpose table with several VAR options

I have a table, that looks like:

ID     NAME    TYPE        DATE_VALUE    STRING_VALUE      NUMBER_VALUE  
1    revenue report_date    01jan2018
1    revenue    name                     Revenue_12m
1    revenue    code                                           1100
1    revenue    sum                                           6000000

I want it to look like:

ID     NAME     report_date     name          code       sum
1    revenue     01jan2018    Revenue_12m     1100     6000000

My code, that gives wrong result (3 rows):

proc transpose data=tab1 out=tab2;
by id name;
var number_value string_value date_value;
id type;
run;

Upvotes: 1

Views: 67

Answers (2)

Parfait
Parfait

Reputation: 107652

Consider conditional aggregation in SQL, assuming your types are a handful of items:

proc sql;
    create table output as
    select ID, NAME,
           MAX(case when TYPE = 'name' then STRING_VALUE else "" end) AS report_name,
           MAX(case when TYPE = 'report_date' then DATE_VALUE else . end) AS report_date format=date9.,
           MAX(case when TYPE = 'code' then NUMBER_VALUE else . end) AS code,
           MAX(case when TYPE = 'sum' then NUMBER_VALUE else . end) AS sum
    from raw
    group ID, NAME;
quit;

Input

data raw;
    infile datalines delimiter=',' DSD; 
    length NAME $ 7 TYPE $ 11 STRING_VALUE $11 NUMBER_VALUE 7;
    input ID NAME $ TYPE $ STRING_VALUE $ NUMBER_VALUE DATE_VALUE date9.;
    format DATE_VALUE DATE9.;
    datalines;
1,revenue,report_date,,.,01jan2018
1,revenue,name,Revenue_12m,.,.
1,revenue,code,,1100,.
1,revenue,sum,,6000000,.
;

Output

ID    NAME   report_name   report_date  code     sum 
 1 revenue   Revenue_12m     01JAN2018  1100 6000000 

Upvotes: 1

Tom
Tom

Reputation: 51566

You could solve this by running PROC TRANSPOSE three times, once for each data type. You also need to remove the conflict over the variable named NAME.

So first let's create your example data.

data have;
  input ID $ xNAME $ TYPE :$32. DATE_VALUE :date. STRING_VALUE :$50. NUMBER_VALUE;
  format date_value date9.;
cards;
1 revenue report_date 01jan2018 . .
1 revenue name        . Revenue_12m .
1 revenue code        . . 1100
1 revenue sum         . . 6000000
;

Now we can generate three separate datasets.

proc transpose data=have out=dates(drop=_name_);
  where not missing(date_value);
  by id xname;
  var date_value;
  id type;
run;
proc transpose data=have out=strings(drop=_name_);
  where not missing(string_value);
  by id xname;
  var string_value;
  id type;
run;
proc transpose data=have out=numbers(drop=_name_);
  where not missing(number_value);
  by id xname;
  var number_value;
  id type;
run;

And combine them.

data want;
  merge dates strings numbers;
  by id xname;
run;

Note that this method will not create columns for values of TYPE that are always missing. You could change the WHERE statements to instead list the values of TYPE that use that particular source variable instead. But then you would need to know the list of possible values of TYPE in advance, in which case why not just write a data step to do the transform. Something like this:

data want;
  length id $8 xname $50 report_date 8 name $50 code sum 8 ;
  format report_date date9.;
  do until(last.xname);
    set have ;
    by id xname ;
    select (type);
      when ('report_date')  report_date=date_value;
      when ('name')  name=string_value;
      when ('code')  code=number_value;
      when ('sum')  sum=number_value;
      otherwise put 'WARNING: Unknown type. ' id= xname= type= ;
    end;
  end;
  drop type date_value string_value number_value;
run;

If you have metadata about your data you could use that to define the variables and generate the SELECT statement.

Upvotes: 0

Related Questions