amestrian
amestrian

Reputation: 592

"case when" going wrong

I want to create a column "Cured" that takes value 1 when the customer left have1 either on the day or in the two following days after appearing in have2. Data have1 is a dataset that defines when a client entered and left certain status. have2 shows when the client was contacted.

This is the code I made:

proc SQL;
create table want2 as
select b.*
,     case when b.Outcome = "Answered" or
                b.Outcome = "Answerphone Message Left" or       
                b.Outcome = "Answerphone Message Not Left" or
                b.Outcome = "No Answer"
                and
                a.Start_date <= b.Date <= a.End_date
                and
                a.End_date <= b.Date+2
                then 1 else 0 END as Cured    
                
from have1  a, have2 b
where a.ID=b.ID;
quit;

The datasets I'm working with are:

data have1;
infile datalines dlmstr=' | ';
input ID  Start_date :ddmmyy10. End_date :ddmmyy10.;
format date start_date date9.;
datalines;
ID | Start_date | End_date 
1  | 01/01/2021 | 03/01/2021
1  | 20/01/2021 | 21/01/2021
2  | 05/01/2021 | 07/01/2021
3  | 10/01/2021 | 30/01/2021
3  | 25/01/2021 | 25/01/2021
;;;
run;

data have2;
infile datalines dlmstr=' | ';
input ID  Date :ddmmyy10. Outcome ;
format Date date9.;
datalines;
ID | Date       | Outcome
1  | 01/01/2021 | Answered
2  | 05/01/2021 | Asnwerphone Message Left
3  | 12/01/2021 | Answerphone Message Left
3  | 25/01/2021 | No Answer
;;;
run;

In this case I should get the following:

ID | Date       | Outcome                   | Cured 
1  | 01/01/2021 | Answered                  | 1  
2  | 05/01/2021 | Asnwerphone Message Left  | 1
3  | 12/01/2021 | Answerphone Message Left  | 0
3  | 25/01/2021 | No Answer                 | 1

ID1 cured because he left have1 2 days after 01/01

ID2 also cured because they left have1 2 days after 05/01

ID3 didn't cure because they left have1 more than 15 days after 12/01

ID3 cured because they left have1 on the same day

This is the result with that code:

ID| Date      | Outcome  | Cured
1   01JAN2021   Answered    1
1   01JAN2021   Answered    0
2   05JAN2021   Asnwerph    0
3   12JAN2021   Answerph    0
3   25JAN2021   No Answe    0
3   12JAN2021   Answerph    0
3   25JAN2021   No Answe    1

Which is not right: first of all it duplicates observations, but it doesnt accurately selects those that should be 1 in Cured (ID 2).

Any idea what might be wrong/missing?

Upvotes: 1

Views: 78

Answers (4)

Richard
Richard

Reputation: 27508

Complex logic generally needs parentheses to properly group expressions.

The SQL logic operator AND has precedence over OR

A linearly coded logic statement such as

P OR Q or R OR S AND X AND Y AND Z

evaluates as

P or Q or R or (((S and X) and Y) and Z)

and generally needs to be corrected to

(P OR Q or R OR S) AND X AND Y AND Z

As stated by @Parfait, you can use IN operator when P Q R S are all expressions dealing with matching a single character variable with a number of different terms. Use IN: to match term prefixes.

Upvotes: 0

Tom
Tom

Reputation: 51581

Sounds like you want to GROUP the results and set CURED=1 if ANY of the end dates for that ID meet your criteria. So use the MAX() aggregate function.

I am not sure why you are testing the values of OUTCOME since there are no values in your example that are not included in your list of values to test for.

data have1;
  input ID  Start_date :yymmdd. End_date :yymmdd.;
  format Start_date End_date yymmdd10.;
datalines4;
1 2021-01-01 2021-01-03
1 2021-01-20 2021-01-21
2 2021-01-05 2021-01-07
3 2021-01-10 2021-01-30
3 2021-01-25 2021-01-25
;;;;

data have2;
  input ID  Date :yymmdd. Outcome $40.;
  format Date yymmdd10.;
datalines4;
1 2021-01-01 Answered
2 2021-01-05 Answerphone Message Left
3 2021-01-12 Answerphone Message Left
3 2021-01-25 No Answer
4 2021-01-25 No Answer
;;;;

proc SQL;
create table want2 as
select b.id
     , b.date
     , b.outcome
     , max(
         case when (a.Start_date <= b.Date <= a.End_date)
               and (a.End_date <= b.Date+2) then 1
         else 0 
       end
       ) as Cured    
from have1 a
 right join have2 b
 on a.id=b.id
group by b.id,b.date,b.outcome
;
quit;

Result:

Obs    ID          Date    Outcome                     Cured

 1      1    2021-01-01    Answered                      1
 2      2    2021-01-05    Answerphone Message Left      1
 3      3    2021-01-12    Answerphone Message Left      0
 4      3    2021-01-25    No Answer                     1
 5      4    2021-01-25    No Answer                     0

PS Displaying dates in D-M-Y or M-D-Y order will just confuse half of your audience.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269973

I suspect that you want:

select b.*,
       (case when b.Outcome in ('Answered', 'Answerphone Message Left', 
 'Answerphone Message Not Left', 'No Answer') and
                  a.Start_date <= b.Date and
                  b.Date < a.End_date and
                  a.End_date <= b.Date+2
             then 1 else 0 
        end) as Cured 

I also wonder if you really want a Cartesian product. Usually, you would want join with a specific join condition.

Upvotes: 1

Parfait
Parfait

Reputation: 107652

In any programming code (beyond SAS and SQL), whenever you use OR with AND, always separate conditions with parentheses:

CASE 
   WHEN (b.Outcome = "Answered" OR 
         b.Outcome = "Answerphone Message Left" OR 
         b.Outcome = "Answerphone Message Not Left" OR 
         b.Outcome = "No Answer")
        AND
        (a.Start_date <= b.Date < a.End_date)
        AND 
        (a.End_date <= b.Date+2)
   THEN 1 
   ELSE 0
 END AS Cured 

Better yet, use IN operator for collection of values and avoid need of OR:

CASE 
   WHEN b.Outcome IN ("Answered",
                      "Answerphone Message Left",
                      "Answerphone Message Not Left",
                      "No Answer")
        AND
        a.Start_date <= b.Date < a.End_date
        AND 
        a.End_date <= b.Date+2
   THEN 1 
   ELSE 0
 END AS Cured 

Upvotes: 0

Related Questions