Reputation: 91
I am working with some state-level data and want to create a collective "USA" level within that state data. I have about 10 years of data for each state and would like to keep that format for the "USA" level of the variable. Is there an easy way I can do this? I will attach some sample data below.
data test;
input State$ year count1 count2;
datalines;
MA 2010 33 34
MA 2011 54 32
MA 2012 45 43
MA 2013 49 50
LA 2010 29 29
LA 2011 32 34
LA 2012 34 44
LA 2013 44 45
WV 2010 33 34
WV 2011 34 37
WV 2012 44 43
WV 2013 33 35
;
run;
Here is what I would like the data to look like:
data test_2;
input State$ year count1 count2;
datalines;
MA 2010 33 34
MA 2011 54 32
MA 2012 45 43
MA 2013 49 50
LA 2010 29 29
LA 2011 32 34
LA 2012 34 44
LA 2013 44 45
WV 2010 33 34
WV 2011 34 37
WV 2012 44 43
WV 2013 33 35
USA 2010 95 97
USA 2011 120 103
USA 2012 121 130
USA 2013 126 130
;
run;
Upvotes: 1
Views: 86
Reputation: 27508
Instead of adding rows that are for a dimensional summary, consider using Proc REPORT and its full feature set that includes using a NOTSORTED format and MULTILABEL format processing.
Example:
proc format ;
value $geo (multilabel notsorted)
'MA' = 'MA'
'LA' = 'LA'
'WV' = 'WV'
'MA', 'LA', 'WV' = 'USA'
;
run ;
ods html file='report.html' ;
proc report data=test ;
column state year count1 count2 ;
define state / group order=data preloadfmt format=$geo. mlf ;
define year / group ;
run ;
ods html close ;
Upvotes: 2
Reputation: 51611
Since the grouping variable YEAR can only take on a small number of integer values you could easily use a temporary array to store the totals. You could even make it flexible enough to determine the range of years that appear in the data.
data have;
input State $ year count1 count2;
datalines;
MA 2010 33 34
MA 2011 54 32
MA 2012 45 43
MA 2013 49 50
LA 2010 29 29
LA 2011 32 34
LA 2012 34 44
LA 2013 44 45
WV 2010 33 34
WV 2011 34 37
WV 2012 44 43
WV 2013 33 35
;
data want;
set have end=eof;
array total[1960:2099,2] _temporary_ (280*0);
array years[2] _temporary_ (2099 1960);
total[year,1]+count1;
total[year,2]+count2;
years[1]= min(years[1],year);
years[2]= max(years[2],year);
output;
if eof then do;
state='USA';
do year=years[1] to years[2];
count1=total[year,1];
count2=total[year,2];
output;
end;
end;
run;
Upvotes: 1
Reputation: 12909
Here are two ways to do it: with SQL and PROC MEANS.
SQL
Append the sums by year you with an Outer Union.
proc sql;
create table want as
select *
from test
OUTER UNION CORR
select 'USA' as State
, year
, sum(count1) as count1
, sum(count2) as count2
from test
group by year
;
quit;
PROC MEANS
Calculate the totals for State*Year
and append it to the main dataset. We include State
in there so we can include the original values. It will make the grand total a missing value for State
but we can modify that after-the-fact.
proc means data=test noprint;
class State year;
types Year State*Year;
output out=want(drop=_:)
sum=
;
run;
data want;
modify want;
where state = ' ';
state = 'USA';
run;
Upvotes: 3
Reputation: 2776
Using the DATA step:
proc sort data=test out=want;
by year;
run;
* Calculate USA totals;
data want;
set want;
by year;
total1 + count1;
total2 + count2;
if last.year then do;
count1 = total1;
count2 = total2;
state = 'USA';
output;
total1 = 0;
total2 = 0;
end;
drop total:;
run;
* Add USA totals to the end of the data;
data want;
set test want;
run;
Obs State year count1 count2
1 MA 2010 33 34
2 MA 2011 54 32
3 MA 2012 45 43
4 MA 2013 49 50
5 LA 2010 29 29
6 LA 2011 32 34
7 LA 2012 34 44
8 LA 2013 44 45
9 WV 2010 33 34
10 WV 2011 34 37
11 WV 2012 44 43
12 WV 2013 33 35
13 USA 2010 95 97
14 USA 2011 120 103
15 USA 2012 123 130
16 USA 2013 126 130
Upvotes: 2