Reputation: 25
I have a dataframe with many thousands of rows. Every row is a hospitalization record; it contains the ID of the patient and a lot of health information (diagnosis, date of admission, date of dismissal, and so on).
Every patient can have more than a hospitalization record, but I need only the first hospitalization of every patient, e.g. the first record for each patient ID according to the date of admission. How can I get this result in R?
Thank you in advance.
Upvotes: 2
Views: 2442
Reputation: 2353
I think I have a solution, but there's probably a smoother way to do this.
Try this using dplyr
. Note, I assume that when you say 'first' record you mean oldest record. If you want the most recent record, use max()
instead.
install.packages('dplyr')
library(dplyr)
your_data <- group_by(your_data, patientID)
## This gives you a data frame with all dates and IDs for first visits
first_records <- summarise(your_data, min(admit_date))
## Create ID to match
first_records$matchID <- paste(first_records$patientID, first_records$admit_date)
your_data$matchID <- paste(your_data$patientID, your_data$admit_date)
## Get complete records
first_records <- your_data[your_data$matchID %in% first_records$matchID, ]
Lemme know how this goes.
EDIT: Definitely looks like an easier solution that @alistaire posted:
your_data <- group_by(your_data, patientID)
first_records <- filter(your_data, adm_date == min(admission_date))
Upvotes: 1