Reputation: 31
I am having trouble creating an efficient way to use PROC SQL in SAS to gather monthly data for 4 years (2017, 2018, 2019, through now of 2020).
My current (shortened) code:
PROC SQL;
select
count(VAL1) as name1, sum(VAL2) as name2
from table tbl
WHERE tbl.dte >= '20170101' and tbl.dte < '20170201'
);
I am currently just using a copy and paste method over and over, but I would need to do this over a hundred times, for four tables (equaling to about 500 times).
Is there a more efficient way to do this?
Upvotes: 0
Views: 600
Reputation: 51611
Since your strings are in YYYYMMDD order just take the first 6 characters to get a distinct value for each month.
select substr(dte,1,6) as month
, count(VAL1) as name1
, sum(VAL2) as name2
from have
group by month
;
Upvotes: 0
Reputation: 27508
Aggregate computations can be performed using SQL
or a statistics procedure such as Proc MEANS
.
Consider a data set containing individual locations reports of daily sales for 50 locations. The aggregation would be total sales (over all locations) at the monthly level. Management also wants to know the number of sales reports in the month.
Example:
data raw(label='Some US Census sales data as basis of simulation' keep=_q _sales);
* some data copied from https://www.census.gov/retail;
format _q yyq.; input @1 q 1. @13 y 4. _sales: comma8.;
_q = yyq(y,q);
datalines;
4th quarter 2019 1,464,339
3rd quarter 2019 1,381,537
2nd quarter 2019 1,377,288
1st quarter 2019 1,241,540
4th quarter 2018 1,407,934
3rd quarter 2018 1,323,360
2nd quarter 2018 1,332,848
1st quarter 2018 1,219,133
4th quarter 2017 1,361,001
3rd quarter 2017 1,262,868
2nd quarter 2017 1,266,215
1st quarter 2017 1,156,810
4th quarter 2016 1,294,590
3rd quarter 2016 1,217,376
2nd quarter 2016 1,218,921
1st quarter 2016 1,120,887
4th quarter 2015 1,253,997
3rd quarter 2015 1,193,142
2nd quarter 2015 1,194,480
1st quarter 2015 1,084,374
4th quarter 2014 1,231,471
3rd quarter 2014 1,170,225
2nd quarter 2014 1,177,252
1st quarter 2014 1,060,492
run;
proc sort data=raw; by _q;
run;
data have(label='Simulate some activity to be summarized');
set raw;
days = intnx('quarter',_q,0,'E') - _q;
_x = _sales / 50 / days;
do date = _q to _q + days;
datestring = put (date, yymmddn8.);
do storeid = 1 to 50;
reportid + 1;
sales = round(_x - 25 + rand('uniform', 50));
output;
end;
end;
keep datestring storeid reportid sales;
run;
* Compute monthly aggregates - SQL way;
proc sql;
create table want as
select
intnx('month', date, 0) as month format=yymon7.
, count(reportid) as report_count format=comma7.
, sum(sales) as month_sales format=dollar12.
from
(
select
input(datestring,yymmdd8.) as date,
have.*
from have
) have /* this is now an alias for outer scope */
group by calculated month
;
* convert datestring to date value;
data have_v / view=have_v;
set have;
date = input (datestring,yymmdd8.); format date yymmddn8.;
run;
* Compute monthly aggregates - MEANS way;
* Grouping occurs at the formatted values of the BY variables(s);
* The date format yymon7. is used to force aggregation by month;
proc means noprint data=have_v;
by date;
var sales reportid;
format date yymon7.;
output out=monthly_summary n(reportid)=count sum(sales)=sales;
format sales dollar12. count comma7.;
run;
Upvotes: 0
Reputation: 1270401
How about aggregation?
select year(tbl.dte), month(tbl.dte), count(VAL1) as name1, sum(VAL2) as name2
from table tbl
where tbl.dte >= '20170101'
group by year(tbl.dte), month(tbl.dte)
Upvotes: 1