Zhengxi Jiang
Zhengxi Jiang

Reputation: 49

SAS proc sql, how to find out the missing rows in a yearly table for each claim case?

I am working on a yearly table for claim cases. the goal is for each existing claim, there should be one row at the end of each year to reflect the situation of that claim case.for example claimA has opened in 2020 and closed in 2020, in the database I will only have one row that is the row with date 31/12/2020. I should add the duplication of 2020 for the years of 2021-2024, so that I have one row for each year. the manager will ahve a cumulative view of all the cost for all the claim cases together. now I have figuered out how to add rows of 2021-2024 if this case is definitively closed.

the specific cases that I didn't figured out the solution is the cases that are reopened. Take this example, suppose claimA is in 2023 reopned. because now the status is active, in the data base there are lines for 2023 and 2024. But the rows for 2021 and 2022 are missing. as We need the cumulative total, so first thing need to be solved is which year are missing and then duplicate the situation of which year? in this case the program should duplicate the situation of 2020 for 2021 and 2022. Can the comunity give an direction that I can work further ?Thank you in advance!

enter image description here

the wanted result is enter image description here PROC SQL ; CREATE TABLE ClosedClaims_LatestRecordw AS SELECT *, 1 AS JOIN_KEY FROM ClosedClaimsw WHERE DATE = MAXMAXDATE; quit;

proc sql;
   create table year_table (YEAR num , JOIN_KEY num);
/*insert values into table*/         
   insert into year_table
    values( 2020 , 1 )
    values( 2021 , 1 )
    values( 2022 , 1 )
    values( 2023 , 1 )
    values( 2024 , 1 );
/*display table*/
   select * from year_table;
quit;

Proc sql;
CREATE TABLE ClosedClaims_Joinw AS
SELECT     CC.*, YT.YEAR AS YEAR_ADDEDRECORD
FROM ClosedClaims_LatestRecordw CC
INNER JOIN year_table YT ON CC.JOIN_KEY = YT.JOIN_KEY;
quit;

* in this code, keep only the extra lines that is newer than the maxdate of the claim;
proc sql;
CREATE TABLE ClosedClaims_NewRecordsw AS
SELECT     * FROM     ClosedClaims_Joinw
WHERE YEAR( DATE ) < YEAR_ADDEDRECORD;
quit;

Upvotes: 0

Views: 84

Answers (2)

Richard
Richard

Reputation: 27508

You can use a triangular join (the on condition uses <) to compute a cumulative sum. Note: There are other non-SQL ways that could be quicker or use less resources.

Example:

Compute cumulatives over the years 2018 to 2025

data have ;
  call streaminit(2024) ;
  do caseid = 1 to 20 ;
    format start final yymmdd10. ;
    final = '01jan2018'd ;
    do _n_ = 0 by 1 until (final > today()) ;
      if _n_ and rand('uniform') > 1 / sqrt((_n_+1)) then leave ;
      start = final + rand('integer', 1000) ; 
      final = start + rand('integer', 365) ;
      amount = rand ('integer', 1000, 2000) ;
      checkid + 1 ;
      output ;
    end ;
  end ;
run ;

data years ;
  do year = 2018 to 2025 ;
    output ;
  end ;
run ;

proc sql ;
  * for demonstration of the details that will be grouped for aggregation ;
  create table want_detail_triangle as
  select caseid, year, start, final, checkid, amount
  from years
  left join have on have.final < mdy(12,31,years.year)
  order by caseid, year, final
  ;

  * cumulative year-end aggregates ;
  create table want as
  select caseid, year as asof_year_end
  , sum(amount) as cumulative_total_amount
  , count(checkid) as checks_issued_count
  from years
  left join have as one on one.final < mdy(12,31,years.year)
  group by
    caseid, year
  ;
quit ;

Cumulative aggregates for caseid=10

enter image description here

Computed from triangular join of details

enter image description here

Upvotes: 0

Tom
Tom

Reputation: 51611

Don't use SQL to do things that are much, much easier in normal SAS code.

So assuming you have dataset named CLAIM_CASES with variables OPENED_DATE and CLOSED_DATE then you can make a dataset that expands that to have one observation per year that the claim crosses with a simple data step like this:

data expand;
  set claim_cases;
  do year=year(opened_date) to year(closed_date);
    output;
  end;
run;

To make that YEAR_TABLE dataset just use a simple DO loop.

data year_table;
  do year=2020 to 2024;
    output;
  end;
run;

Upvotes: 2

Related Questions