Reputation: 147
I have a dataset of patient diagnoses with one diagnosis code per line, resulting in patient diagnoses on multiple lines. Each patient has a unique patientID. I also have age, race, gender, etc. data on these patients.
How do I indicate to SAS when using PROC FREQ, Logistic, Univariate, etc. that they are the same patient?
This is an example of what the data looks like:
patientID diagnosis age gender lab
1 15.02 65 M positive
1 250.2 65 M positive
2 348.2 23 M negative
2 282.1 23 M negative
3 50 F positive
I was given data on every patient who has had a certain lab (regardless of positive result), as well as all of their diagnoses, which each appear on a different line (as a different observation to SAS). First, I will need to exclude every patient who has a negative result for the lab, which I plan on using an IF statement for. The lab determines if the patient has disease X. Some patients do not have any additional diseases, other than disease X, such as patient #3.
Analyses I would like to perform:
Thanks!
Upvotes: 0
Views: 238
Reputation: 21294
The answer to your question is you cannot by default. But when you're processing the data you can account for it easily. IMO keeping it long is easier.
You've asked too many questions above so I'll answer just one, how to count the number of people with disease x.
Proc sort data = have out = unique_disease_patient nodupkey;
By patientID Diag;
Run;
Proc freq data = unique_disease_patient noprint;
Table disease / out = disease_patient_count;
Run;
Note that this is much easier in SQL
Proc sql;
Create table want as
Select diag, count(distinct patientID)
From have
Group by diag;
Quit;
I'm assuming this is homework because you're unlikely to do this in practice except for exploratory analysis.
Upvotes: 2