Reputation: 592
I want to create a column "Cured" that takes value 1 when the customer left have1
either on the day or in the two following days after appearing in have2
. Data have1
is a dataset that defines when a client entered and left certain status. have2
shows when the client was contacted.
This is the code I made:
proc SQL;
create table want2 as
select b.*
, case when b.Outcome = "Answered" or
b.Outcome = "Answerphone Message Left" or
b.Outcome = "Answerphone Message Not Left" or
b.Outcome = "No Answer"
and
a.Start_date <= b.Date <= a.End_date
and
a.End_date <= b.Date+2
then 1 else 0 END as Cured
from have1 a, have2 b
where a.ID=b.ID;
quit;
The datasets I'm working with are:
data have1;
infile datalines dlmstr=' | ';
input ID Start_date :ddmmyy10. End_date :ddmmyy10.;
format date start_date date9.;
datalines;
ID | Start_date | End_date
1 | 01/01/2021 | 03/01/2021
1 | 20/01/2021 | 21/01/2021
2 | 05/01/2021 | 07/01/2021
3 | 10/01/2021 | 30/01/2021
3 | 25/01/2021 | 25/01/2021
;;;
run;
data have2;
infile datalines dlmstr=' | ';
input ID Date :ddmmyy10. Outcome ;
format Date date9.;
datalines;
ID | Date | Outcome
1 | 01/01/2021 | Answered
2 | 05/01/2021 | Asnwerphone Message Left
3 | 12/01/2021 | Answerphone Message Left
3 | 25/01/2021 | No Answer
;;;
run;
In this case I should get the following:
ID | Date | Outcome | Cured
1 | 01/01/2021 | Answered | 1
2 | 05/01/2021 | Asnwerphone Message Left | 1
3 | 12/01/2021 | Answerphone Message Left | 0
3 | 25/01/2021 | No Answer | 1
ID1 cured because he left have1
2 days after 01/01
ID2 also cured because they left have1
2 days after 05/01
ID3 didn't cure because they left have1
more than 15 days after 12/01
ID3 cured because they left have1
on the same day
This is the result with that code:
ID| Date | Outcome | Cured
1 01JAN2021 Answered 1
1 01JAN2021 Answered 0
2 05JAN2021 Asnwerph 0
3 12JAN2021 Answerph 0
3 25JAN2021 No Answe 0
3 12JAN2021 Answerph 0
3 25JAN2021 No Answe 1
Which is not right: first of all it duplicates observations, but it doesnt accurately selects those that should be 1 in Cured (ID 2).
Any idea what might be wrong/missing?
Upvotes: 1
Views: 78
Reputation: 27508
Complex logic generally needs parentheses to properly group expressions.
The SQL logic operator AND
has precedence over OR
A linearly coded logic statement such as
P OR Q or R OR S AND X AND Y AND Z
evaluates as
P or Q or R or (((S and X) and Y) and Z)
and generally needs to be corrected to
(P OR Q or R OR S) AND X AND Y AND Z
As stated by @Parfait, you can use IN
operator when P
Q
R
S
are all expressions dealing with matching a single character variable with a number of different terms. Use IN:
to match term prefixes.
Upvotes: 0
Reputation: 51581
Sounds like you want to GROUP the results and set CURED=1 if ANY of the end dates for that ID meet your criteria. So use the MAX() aggregate function.
I am not sure why you are testing the values of OUTCOME since there are no values in your example that are not included in your list of values to test for.
data have1;
input ID Start_date :yymmdd. End_date :yymmdd.;
format Start_date End_date yymmdd10.;
datalines4;
1 2021-01-01 2021-01-03
1 2021-01-20 2021-01-21
2 2021-01-05 2021-01-07
3 2021-01-10 2021-01-30
3 2021-01-25 2021-01-25
;;;;
data have2;
input ID Date :yymmdd. Outcome $40.;
format Date yymmdd10.;
datalines4;
1 2021-01-01 Answered
2 2021-01-05 Answerphone Message Left
3 2021-01-12 Answerphone Message Left
3 2021-01-25 No Answer
4 2021-01-25 No Answer
;;;;
proc SQL;
create table want2 as
select b.id
, b.date
, b.outcome
, max(
case when (a.Start_date <= b.Date <= a.End_date)
and (a.End_date <= b.Date+2) then 1
else 0
end
) as Cured
from have1 a
right join have2 b
on a.id=b.id
group by b.id,b.date,b.outcome
;
quit;
Result:
Obs ID Date Outcome Cured
1 1 2021-01-01 Answered 1
2 2 2021-01-05 Answerphone Message Left 1
3 3 2021-01-12 Answerphone Message Left 0
4 3 2021-01-25 No Answer 1
5 4 2021-01-25 No Answer 0
PS Displaying dates in D-M-Y or M-D-Y order will just confuse half of your audience.
Upvotes: 0
Reputation: 1269973
I suspect that you want:
select b.*,
(case when b.Outcome in ('Answered', 'Answerphone Message Left',
'Answerphone Message Not Left', 'No Answer') and
a.Start_date <= b.Date and
b.Date < a.End_date and
a.End_date <= b.Date+2
then 1 else 0
end) as Cured
I also wonder if you really want a Cartesian product. Usually, you would want join
with a specific join
condition.
Upvotes: 1
Reputation: 107652
In any programming code (beyond SAS and SQL), whenever you use OR
with AND
, always separate conditions with parentheses:
CASE
WHEN (b.Outcome = "Answered" OR
b.Outcome = "Answerphone Message Left" OR
b.Outcome = "Answerphone Message Not Left" OR
b.Outcome = "No Answer")
AND
(a.Start_date <= b.Date < a.End_date)
AND
(a.End_date <= b.Date+2)
THEN 1
ELSE 0
END AS Cured
Better yet, use IN
operator for collection of values and avoid need of OR
:
CASE
WHEN b.Outcome IN ("Answered",
"Answerphone Message Left",
"Answerphone Message Not Left",
"No Answer")
AND
a.Start_date <= b.Date < a.End_date
AND
a.End_date <= b.Date+2
THEN 1
ELSE 0
END AS Cured
Upvotes: 0