Reputation: 11
I have a dataset of multiple patients, each with various medications and I want to find the first and last date for each patient/medication combination.
HAVE:
ID | MED | start | stop | med_other |
---|---|---|---|---|
B12345 | 86 | 3/23/2021 | 1/9/2022 | |
B12345 | 86 | 8/17/2021 | 1/9/2022 | |
B12345 | 86 | 2/10/2022 | 3/5/2023 | |
B12345 | 99 | 1/1/2020 | 2/15/2020 | someothermedname |
B12345 | 79 | 3/23/2021 | 1/9/2022 | |
B12345 | 79 | 4/20/2022 | 4/21/2022 | |
B12345 | 79 | 5/1/2022 | 4/30/2023 | |
B12345 | 79 | 6/8/2023 | 7/30/2024 | |
A54321 | 86 | 1/1/2019 | 1/3/2019 | |
A54321 | 86 | 2/5/2020 | 3/5/2020 | |
A54321 | 86 | 3/6/2020 | 4/6/2020 | |
A54321 | 50 | 1/1/2019 | 1/3/2019 | |
A54321 | 50 | 2/5/2020 | 3/5/2020 | |
A54321 | 50 | 3/6/2020 | 4/6/2020 | |
A54321 | 50 | 5/4/2020 | 5/5/2020 | |
A54321 | 50 | 5/10/2020 | 5/11/2020 | |
A54321 | 50 | 6/1/2020 | 8/30/2020 | |
C98765 | 25 | 6/8/2024 | 10/11/2024 | |
C98765 | 30 | 6/8/2024 | 12/1/2024 | |
C98765 | 30 | 6/9/2024 | 12/1/2024 | |
C98765 | 30 | 8/17/2024 | 12/31/2024 | |
C98765 | 30 | 1/1/2025 | 1/2/2025 | |
C98765 | 55 | 5/15/2020 | 5/30/2020 | |
C98765 | 55 | 4/15/2021 | 6/30/2022 | |
C98765 | 86 | 1/1/2019 | 2/1/2019 | |
C98765 | 86 | 3/1/2020 | 4/1/2020 | |
C98765 | 86 | 1/1/2024 | 2/1/2024 | |
C98765 | 86 | 3/1/2018 | 3/6/2018 |
Desired result:
ID | MED | start | stop | med_other |
---|---|---|---|---|
B12345 | 86 | 3/23/2021 | 3/5/2023 | |
B12345 | 79 | 3/23/2021 | 7/30/2024 | |
B12345 | 99 | 1/1/2020 | 2/15/2020 | someothermedname |
A54321 | 86 | 1/1/2019 | 4/6/2020 | |
A54321 | 50 | 1/1/2019 | 8/30/2020 | |
C98765 | 30 | 6/8/2024 | 1/2/2025 | |
C98765 | 55 | 5/15/2020 | 6/30/2022 | |
C98765 | 86 | 3/1/2018 | 2/1/2024 |
proc summary data=have nway ;
class id med ;
var start stop;
output out=want min(start)=start max(stop)=stop ;
run;
This gave me one row for each id/med with earliest start and latest stop, but I lose the variable med_other.
How do I get proc summary to include additional variables?
Upvotes: 1
Views: 40
Reputation: 9109
You can get this with IDGROUP. Check out the SAS docs IDGROUP.
data have;
infile cards dsd dlm='09'x;
input ID$ MED (start stop)(:mmddyy10.) med_other :$32.;
format st: mmddyy10.;
cards;
B12345 86 3/23/2021 1/9/2022
B12345 86 8/17/2021 1/9/2022
B12345 86 2/10/2022 3/5/2023
B12345 99 1/1/2020 2/15/2020 someothermedname
B12345 79 3/23/2021 1/9/2022
B12345 79 4/20/2022 4/21/2022
B12345 79 5/1/2022 4/30/2023
B12345 79 6/8/2023 7/30/2024
A54321 86 1/1/2019 1/3/2019
A54321 86 2/5/2020 3/5/2020
A54321 86 3/6/2020 4/6/2020
A54321 50 1/1/2019 1/3/2019
A54321 50 2/5/2020 3/5/2020
A54321 50 3/6/2020 4/6/2020
A54321 50 5/4/2020 5/5/2020
A54321 50 5/10/2020 5/11/2020
A54321 50 6/1/2020 8/30/2020
C98765 25 6/8/2024 10/11/2024
C98765 30 6/8/2024 12/1/2024
C98765 30 6/9/2024 12/1/2024
C98765 30 8/17/2024 12/31/2024
C98765 30 1/1/2025 1/2/2025
C98765 55 5/15/2020 5/30/2020
C98765 55 4/15/2021 6/30/2022
C98765 86 1/1/2019 2/1/2019
C98765 86 3/1/2020 4/1/2020
C98765 86 1/1/2024 2/1/2024
C98765 86 3/1/2018 3/6/2018
;;;;
run;
proc summary data=have nway ;
class id med ;
var start stop;
output out=want min(start)=start max(stop)=stop
idgroup(out(med_other)=);
;
run;
proc print;
run;
Upvotes: 3