Reputation: 27
I am new to SAS and I am trying to build some query using proc sql.
I have the following code in sql (teradata syntax):
create multiset volatile table mvt_data, no log
( date1 date format 'yyyy-mm-dd'
, flag1 byteint
,cust_id varchar(25)
)
primary index (date1)
on commit preserve rows;
insert into mvt_data values (date '2020-01-03', 1, 'A');
insert into mvt_data values (date '2020-02-04', 0, 'A');
insert into mvt_data values (date '2020-04-05', 0, 'B');
insert into mvt_data values (date '2020-01-19', 1, 'C');
insert into mvt_data values (date '2020-03-20', 1, 'B');
insert into mvt_data values (date '2020-06-20', 1, 'D');
I would like to have the following output:
For each Cust_id and YYYYMM of date1 sum the Flag1 and sum Flag1 cumulative with respect to last 6 months.
In same syntax I would do :
with cte_data_ts (ts1, Flag1) as
(
select cast(date1 as timestamp(0)), Flag1
from mvt_data
)
, cte_gbt (YearMonth, SumFlag1, cust_id) as
(
select cust_id as cust_id
,to_char(begin($TD_TIMECODE_RANGE) at 0, 'yyyymm') as YearMonth
, sum(flag1) as SumFlag1
from cte_data_ts
group by time(cal_months(1))
using timecode(ts1)
fill (0)
)
select cust_id, YearMonth, SumFlag1
, sum(SumFlag1) over(order by cust_id, YearMonth asc rows between 6 preceding and current row) as SumFlag1_last6Months
from cte_gbt;
Unfortunatelly I know that over() does not exists in proc sql, so can anyone help me achieve the same result ?
EDIT:
I added a new column (Cust_ID)
Upvotes: 0
Views: 577
Reputation: 51566
So first let's convert the SQL data load into normal SAS data step to have some data to work with.
data mvt_data;
input date1 :yymmdd. flag1 cust_id $;
format date1 yymmdd10.;
cards;
2020-01-03 1 A
2020-02-04 0 A
2020-04-05 0 B
2020-01-19 1 C
2020-03-20 1 B
2020-06-20 1 D
;
Here is a way to generate observations for months that don't appear in the source.
data YearMonth;
min='01JAN2020'd ;
max='01JUN2020'd;
do offset=0 to intck('month',min,max);
do YearMonth=intnx('month',min,offset,'b') ;
output;
end;
end;
keep YearMonth ;
format YearMonth yymmn6.;
run;
Or calculate the min/max values of DATE1 and use those to set the range of dates.
If you don't have another source for the list of Cust_id values you can query the data
select distinct cust_id from mvt_data
So join the two to get the full YearMonth*Cust_id combinations.
select x.cust_id,y.YearMonth
from (select distinct cust_id from mvt_data) x
, YearMonth y
Now join that with the actual data to get the count number where FLAG1 is set per month. You can use the fact that SAS evaluates boolean expressions to 0/1 values to make the code simpler. (In generic SQL implementation you would need to use CASE) So when the date for the flagged observation is in the interval then it contributes 1 to the SUM().
proc sql ;
create table want as
select a.cust_id
, a.YearMonth
, sum(b.date1 between intnx('month',a.YearMonth,0,'b')
and intnx('month',a.YearMonth,0,'e') ) as SumFlag1
, sum(b.date1 between intnx('month',a.YearMonth,-6,'b')
and intnx('month',a.YearMonth,0,'e') ) as SumFlag1_last6Months
, sum(b.date1 between intnx('month',a.YearMonth,-12,'b')
and intnx('month',a.YearMonth,0,'e') ) as SumFlag1_last12Months
from
(select x.cust_id,y.YearMonth
from (select distinct cust_id from mvt_data) x
, YearMonth y
) a
left join mvt_data b
on a.cust_id = b.cust_id and b.Flag1=1
group by 1,2
;
quit;
Results:
Year Sum SumFlag1_ SumFlag1_
Obs cust_id Month Flag1 last6Months last12Months
1 A 202001 1 1 1
2 A 202002 0 1 1
3 A 202003 0 1 1
4 A 202004 0 1 1
5 A 202005 0 1 1
6 A 202006 0 1 1
7 B 202001 0 0 0
8 B 202002 0 0 0
9 B 202003 1 1 1
10 B 202004 0 1 1
11 B 202005 0 1 1
12 B 202006 0 1 1
...
Upvotes: 2
Reputation: 51566
It is probably faster to just use a data step to do the summmary.
For example you could use a temporary array with one item per month to do the accumulation.
So first find a day in the first month you want to include and the number of months after that you want. You could use the range of the months of the date values in the source data or set some fixed range.
proc sql noprint;
select min(date1)
, intck('month',min(date1),max(date1))
into :min_date trimmed
, :nmonths trimmed
from mvt_data
;
quit;
Now use that min date and number of months to drive the counting data step.
Read in all of the records (you only need to records with FLAG1 set). Figure out the offset into the array by using INTCK() function and increment the accumulator for that month. At the end loop over the array and calculate any larger month intervals and output one observation per month.
data want ;
array month[0:&nmonths] _temporary_ (0 &nmonths*0) ;
set mvt_data end=eof;
where Flag1 ;
offset = intck('month',&min_date,date1);
if (0<= offset <= &nmonths) then month[offset]+1;
if eof then do offset=0 to &nmonths;
YearMonth = intnx('month',&min_date,offset);
SumFlag1 = month[offset];
SumFlag1_last6Months = SumFlag1;
do index=max(0,offset-6) to offset-1;
SumFlag1_last6Months = SumFlag1_last6Months+month[index];
end;
output;
end;
keep YearMonth SumFlag1 SumFlag1_last6Months;
format YearMonth yymmn6. ;
run;
Results:
Year Sum SumFlag1_
Obs Month Flag1 last6Months
1 202001 2 2
2 202002 0 2
3 202003 1 3
4 202004 0 3
5 202005 0 3
6 202006 1 4
Upvotes: 0