user6301793
user6301793

Reputation: 25

How to get the first rows in an R dataframe that meet a specific condition?

I have a dataframe with many thousands of rows. Every row is a hospitalization record; it contains the ID of the patient and a lot of health information (diagnosis, date of admission, date of dismissal, and so on).

Every patient can have more than a hospitalization record, but I need only the first hospitalization of every patient, e.g. the first record for each patient ID according to the date of admission. How can I get this result in R?

Thank you in advance.

Upvotes: 2

Views: 2442

Answers (1)

Raphael K
Raphael K

Reputation: 2353

I think I have a solution, but there's probably a smoother way to do this.

Try this using dplyr. Note, I assume that when you say 'first' record you mean oldest record. If you want the most recent record, use max() instead.

install.packages('dplyr')
library(dplyr)

your_data <- group_by(your_data, patientID)
## This gives you a data frame with all dates and IDs for first visits
first_records <- summarise(your_data, min(admit_date))

## Create ID to match 
first_records$matchID <- paste(first_records$patientID, first_records$admit_date)
your_data$matchID <- paste(your_data$patientID, your_data$admit_date)

## Get complete records
first_records <- your_data[your_data$matchID %in% first_records$matchID, ]

Lemme know how this goes.

EDIT: Definitely looks like an easier solution that @alistaire posted:

your_data <- group_by(your_data, patientID)
first_records <- filter(your_data, adm_date == min(admission_date))

Upvotes: 1

Related Questions