Reputation: 675
I'm a new SAS/SQL user and I have a dataset which I need to transpose some rows to columns. I think that there is a faster or easier way to do that and I'd like to some advice to you all. My example will better explain my issue:
Here is the dataset I have:
Month ID Car Claim_Type Cost_of_claim
1 1243 Ferrari Collision 12,000
2 6437 Peugeot Fire 50,000
5 0184 Citroen Stole 3,000
9 1930 Fiat Medical 1,000
3 2934 GM Liability 20,000
And I need to create a dataset like that:
Month ID Car Collision Fire Stole Medical Liability
1 1243 Ferrari 12,000 0 0 0 0
2 6437 Peugeot 0 50,000 0 0 0
5 0184 Citroen 0 0 3,000 0 0
9 1930 Fiat 0 0 0 1,000 0
3 2934 GM 0 0 0 0 20,000
I just transposed some rows to columns...
I was thinking in do something like that to create my new dataset:
proc sql;
select Month, ID, CAR
case when Claim_Type = 'Collision' then Cost_of_claim end Collision,
case when Claim_Type = 'Fire' then Cost_of_claim end Fire,
case when Claim_Type = 'Stole' then Cost_of_claim end Stole,
case when Claim_Type = 'Medical' then Cost_of_claim end Medical,
case when Claim_Type = 'Liability' then Cost_of_claim end Liability
from my_table;
The problem is that a have a huge amount of data and I think that this way might be not too efficient. Also, in my dataset I have much more columns and rows and don't want to type all the possibilities in case when
statements as it does not seems to be to easy (or user-friendly) to maintain the code.
Can someone help me in how to solve that?
Upvotes: 1
Views: 15238
Reputation: 353
PROC TRANSPOSE ought to do what you want.
data test;
input Month ID Car $ Claim_Type : $12. Cost_of_claim;
cards;
1 1243 Ferrari Collision 12000
2 6437 Peugeot Fire 50000
5 0184 Citroen Stole 3000
9 1930 Fiat Medical 1000
3 2934 GM Liability 20000
run;
proc transpose data=test out=transposed;
by notsorted month notsorted id notsorted car;
var cost_of_claim;
id claim_type;
run;
The output data set doesn't have the zeros off-diagonal but you can add those in a data step if you really want them.
Upvotes: 4
Reputation: 1120
This method populates a macro variable with all possible claim_types and loops through them, generating variables in the same way that your example code does, so you won't need to enter all possible cases. The "backstop" variable is used because of the comma in the loop (SAS will error without one more variable after the last comma in a proc sql step).
data have;
input Month ID Car $12. Claim_Type $12. Cost_of_claim;
datalines;
1 1243 Ferrari Collision 12000
2 6437 Peugeot Fire 50000
5 0184 Citroen Stole 3000
9 1930 Fiat Medical 1000
3 2934 GM Liability 20000
;
run;
%macro your_macro;
proc sql noprint;
select distinct claim_type into: list_of_claims separated by " " from have;
create table want (drop = backstop) as select
month, id, car,
%do i = 1 %to %sysfunc(countw(&list_of_claims.));
%let this_claim = %scan(&list_of_claims., &i.);
case when claim_type = "&this_claim." then cost_of_claim else 0 end as &this_claim.,
%end;
1 as backstop
from have;
quit;
%mend your_macro;
%your_macro;
Upvotes: 0
Reputation: 8584
You can try dynamic sql and pivot but performance will depend on how many different claim types you have.
create table #mytable (Month int, ID int, Car varchar(20), Claim_Type varchar(20), Cost_of_claim int)
insert into #mytable values
(1, 1243, 'Ferrari', 'Collision', 12000)
, (2, 6437, 'Peugeot', 'Fire', 50000)
, (5, 184, 'Citroen', 'Stole', 3000)
, (9, 1930, 'Fiat', 'Medical', 1000)
, (3, 2934, 'GM', 'Liability', 20000)
, (12, 4455, 'Ford', 'Theft', 20)
DECLARE @cols AS NVARCHAR(MAX),
@query AS NVARCHAR(MAX)
select @cols = STUFF((SELECT ',' + QUOTENAME(Claim_Type)
from #mytable
group by Claim_Type
order by Claim_Type
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set @query = N'SELECT ' + 'month,id,car,' + @cols + N' from
(
select month,id, car, Cost_of_claim, Claim_Type
from #mytable
) x
pivot
(
max(Cost_of_claim)
for Claim_Type in (' + @cols + N')
) p
'
exec sp_executesql @query;
drop table #mytable
Upvotes: 0