Reputation: 49
I have a table, that looks like:
ID NAME TYPE DATE_VALUE STRING_VALUE NUMBER_VALUE
1 revenue report_date 01jan2018
1 revenue name Revenue_12m
1 revenue code 1100
1 revenue sum 6000000
I want it to look like:
ID NAME report_date name code sum
1 revenue 01jan2018 Revenue_12m 1100 6000000
My code, that gives wrong result (3 rows):
proc transpose data=tab1 out=tab2;
by id name;
var number_value string_value date_value;
id type;
run;
Upvotes: 1
Views: 67
Reputation: 107652
Consider conditional aggregation in SQL, assuming your types are a handful of items:
proc sql;
create table output as
select ID, NAME,
MAX(case when TYPE = 'name' then STRING_VALUE else "" end) AS report_name,
MAX(case when TYPE = 'report_date' then DATE_VALUE else . end) AS report_date format=date9.,
MAX(case when TYPE = 'code' then NUMBER_VALUE else . end) AS code,
MAX(case when TYPE = 'sum' then NUMBER_VALUE else . end) AS sum
from raw
group ID, NAME;
quit;
Input
data raw;
infile datalines delimiter=',' DSD;
length NAME $ 7 TYPE $ 11 STRING_VALUE $11 NUMBER_VALUE 7;
input ID NAME $ TYPE $ STRING_VALUE $ NUMBER_VALUE DATE_VALUE date9.;
format DATE_VALUE DATE9.;
datalines;
1,revenue,report_date,,.,01jan2018
1,revenue,name,Revenue_12m,.,.
1,revenue,code,,1100,.
1,revenue,sum,,6000000,.
;
Output
ID NAME report_name report_date code sum
1 revenue Revenue_12m 01JAN2018 1100 6000000
Upvotes: 1
Reputation: 51566
You could solve this by running PROC TRANSPOSE three times, once for each data type.
You also need to remove the conflict over the variable named NAME
.
So first let's create your example data.
data have;
input ID $ xNAME $ TYPE :$32. DATE_VALUE :date. STRING_VALUE :$50. NUMBER_VALUE;
format date_value date9.;
cards;
1 revenue report_date 01jan2018 . .
1 revenue name . Revenue_12m .
1 revenue code . . 1100
1 revenue sum . . 6000000
;
Now we can generate three separate datasets.
proc transpose data=have out=dates(drop=_name_);
where not missing(date_value);
by id xname;
var date_value;
id type;
run;
proc transpose data=have out=strings(drop=_name_);
where not missing(string_value);
by id xname;
var string_value;
id type;
run;
proc transpose data=have out=numbers(drop=_name_);
where not missing(number_value);
by id xname;
var number_value;
id type;
run;
And combine them.
data want;
merge dates strings numbers;
by id xname;
run;
Note that this method will not create columns for values of TYPE that are always missing. You could change the WHERE statements to instead list the values of TYPE that use that particular source variable instead. But then you would need to know the list of possible values of TYPE in advance, in which case why not just write a data step to do the transform. Something like this:
data want;
length id $8 xname $50 report_date 8 name $50 code sum 8 ;
format report_date date9.;
do until(last.xname);
set have ;
by id xname ;
select (type);
when ('report_date') report_date=date_value;
when ('name') name=string_value;
when ('code') code=number_value;
when ('sum') sum=number_value;
otherwise put 'WARNING: Unknown type. ' id= xname= type= ;
end;
end;
drop type date_value string_value number_value;
run;
If you have metadata about your data you could use that to define the variables and generate the SELECT statement.
Upvotes: 0