user8419142
user8419142

Reputation: 27

SQL statment syntax from Teradata to PROC SQL

I am new to SAS and I am trying to build some query using proc sql.

I have the following code in sql (teradata syntax):

create multiset volatile table mvt_data, no log
( date1     date format 'yyyy-mm-dd'
, flag1     byteint
,cust_id    varchar(25)
)
primary index (date1)
on commit preserve rows;

insert into mvt_data values (date '2020-01-03', 1, 'A');
insert into mvt_data values (date '2020-02-04', 0, 'A');
insert into mvt_data values (date '2020-04-05', 0, 'B');
insert into mvt_data values (date '2020-01-19', 1, 'C');
insert into mvt_data values (date '2020-03-20', 1, 'B');
insert into mvt_data values (date '2020-06-20', 1, 'D');

I would like to have the following output:

For each Cust_id and YYYYMM of date1 sum the Flag1 and sum Flag1 cumulative with respect to last 6 months.

In same syntax I would do :

with cte_data_ts (ts1, Flag1) as
(
select cast(date1 as timestamp(0)), Flag1
  from mvt_data
)
  ,  cte_gbt (YearMonth, SumFlag1, cust_id) as
(
  select cust_id                                          as cust_id
       ,to_char(begin($TD_TIMECODE_RANGE) at 0, 'yyyymm') as YearMonth
       , sum(flag1)                                       as SumFlag1
    from cte_data_ts
group by time(cal_months(1))
   using timecode(ts1)
    fill (0)
)
  select cust_id, YearMonth, SumFlag1
       , sum(SumFlag1) over(order by cust_id, YearMonth asc rows between 6 preceding and current row) as SumFlag1_last6Months
    from cte_gbt;

Unfortunatelly I know that over() does not exists in proc sql, so can anyone help me achieve the same result ?

EDIT:

I added a new column (Cust_ID)

Upvotes: 0

Views: 577

Answers (2)

Tom
Tom

Reputation: 51566

So first let's convert the SQL data load into normal SAS data step to have some data to work with.

data mvt_data;
  input date1 :yymmdd. flag1 cust_id $;
  format date1 yymmdd10.;
cards;
2020-01-03 1 A
2020-02-04 0 A
2020-04-05 0 B
2020-01-19 1 C
2020-03-20 1 B
2020-06-20 1 D
;

Here is a way to generate observations for months that don't appear in the source.

data YearMonth;
  min='01JAN2020'd ;
  max='01JUN2020'd;
  do offset=0 to intck('month',min,max);
    do YearMonth=intnx('month',min,offset,'b') ;
      output;
    end;
  end;
  keep YearMonth ;
  format YearMonth yymmn6.;
run;

Or calculate the min/max values of DATE1 and use those to set the range of dates.

If you don't have another source for the list of Cust_id values you can query the data

select distinct cust_id from mvt_data

So join the two to get the full YearMonth*Cust_id combinations.

select x.cust_id,y.YearMonth 
   from (select distinct cust_id from mvt_data) x
      , YearMonth y

Now join that with the actual data to get the count number where FLAG1 is set per month. You can use the fact that SAS evaluates boolean expressions to 0/1 values to make the code simpler. (In generic SQL implementation you would need to use CASE) So when the date for the flagged observation is in the interval then it contributes 1 to the SUM().

proc sql ;
 create table want as
 select a.cust_id
      , a.YearMonth
      , sum(b.date1 between intnx('month',a.YearMonth,0,'b')
                        and intnx('month',a.YearMonth,0,'e') ) as SumFlag1
      , sum(b.date1 between intnx('month',a.YearMonth,-6,'b')
                        and intnx('month',a.YearMonth,0,'e') ) as SumFlag1_last6Months 
      , sum(b.date1 between intnx('month',a.YearMonth,-12,'b')
                        and intnx('month',a.YearMonth,0,'e') ) as SumFlag1_last12Months 
 from 
  (select x.cust_id,y.YearMonth 
   from (select distinct cust_id from mvt_data) x
      , YearMonth y
  ) a
 left join mvt_data b
   on a.cust_id = b.cust_id and b.Flag1=1
 group by 1,2
 ;
quit;

Results:

                    Year     Sum      SumFlag1_       SumFlag1_
Obs    cust_id     Month    Flag1    last6Months    last12Months

  1       A       202001      1           1               1
  2       A       202002      0           1               1
  3       A       202003      0           1               1
  4       A       202004      0           1               1
  5       A       202005      0           1               1
  6       A       202006      0           1               1
  7       B       202001      0           0               0
  8       B       202002      0           0               0
  9       B       202003      1           1               1
 10       B       202004      0           1               1
 11       B       202005      0           1               1
 12       B       202006      0           1               1
 ...

Upvotes: 2

Tom
Tom

Reputation: 51566

It is probably faster to just use a data step to do the summmary.

For example you could use a temporary array with one item per month to do the accumulation.

So first find a day in the first month you want to include and the number of months after that you want. You could use the range of the months of the date values in the source data or set some fixed range.

proc sql noprint;
  select min(date1)
       , intck('month',min(date1),max(date1))
    into :min_date trimmed
       , :nmonths trimmed
   from mvt_data
  ;
quit;

Now use that min date and number of months to drive the counting data step.

Read in all of the records (you only need to records with FLAG1 set). Figure out the offset into the array by using INTCK() function and increment the accumulator for that month. At the end loop over the array and calculate any larger month intervals and output one observation per month.

data want ;
  array month[0:&nmonths] _temporary_ (0 &nmonths*0) ;
  set mvt_data end=eof;
  where Flag1 ;
  offset = intck('month',&min_date,date1);
  if (0<= offset <= &nmonths) then month[offset]+1;
  if eof then do offset=0 to &nmonths;
    YearMonth = intnx('month',&min_date,offset);
    SumFlag1 = month[offset];
    SumFlag1_last6Months = SumFlag1;
    do index=max(0,offset-6) to offset-1;
       SumFlag1_last6Months = SumFlag1_last6Months+month[index];
    end;
    output;
  end;
  keep YearMonth SumFlag1 SumFlag1_last6Months;
  format YearMonth yymmn6. ;
run;

Results:

         Year     Sum      SumFlag1_
Obs     Month    Flag1    last6Months

 1     202001      2           2
 2     202002      0           2
 3     202003      1           3
 4     202004      0           3
 5     202005      0           3
 6     202006      1           4

Upvotes: 0

Related Questions