Reputation: 47
I have a quite large dataset (35 variables and 65 000 rows) and I would like to split it in three regardind specific dates. I have information about animals before and after a surgery. I'm currently using the dplyr
package. Bellow I present what my dataset looks like, I juste give an exemple because when using on my datasetdput
I obtain something really large and unreadable. As in the exemple I have several dates at which measurements were taken for an individual. The information about the individual is completed by the surgery date which is unique for each individual. As for the example measurements where taken over several years.
Name Date Measurement Surgery_date
Pierre 2016-03-15 5.12 2017-03-21
Pierre 2017-03-16 4.16 2017-03-21
Pierre 2017-08-09 5.08 2017-03-21
Paul 2016-07-03 5.47 2017-03-25
Paul 2016-09-30 4.98 2017-03-25
Paul 2017-04-12 4.51 2017-03-25
For the moment I've been carfull to have date format either for the dates of measurement and for the surgery dates using lubridate
package. Then I've tried, using dplyr
package to sort my data. I've tried filter
and select
but neither of those gave the expected results.
data1$Date <- parse_date_time(data1$Date, "d/m/y")
data1$Date <- ymd(data1$Date)
data1$Surgery_date <- parse_date_time(data1$Surgery_date, "d/m/y")
data1$Surgery_date <- ymd(data1$Surgery_date)
before_surgery <- data1
before_surgery <- dplyr::as_tibble(before_surgery)
before_surgery <- before_surgery %>%
filter(Date > Surgery_date)
before_surgery <- before_surgery %>%
select(Date < Surgery_date)
Either way no row is deleted. When I try (by the same meanings) to obtain dates after surgery, no row is actually selected.
I have checked my file to be sure there is actually dates after and before the surgery date (if not this result would have been normal) and I can confirm there is the two kind of dates in the dataset.
I have just put here the example of the dates before surgery, assuming it works on the same pattern for the dates after surgery.
Thank you in advance for those who will take time to read me. I'm sorry if the question is quite similar to other ones but I have not been able to figure a solution on my own...
EDIT : To be more specific the ultimate goal is to have, three separeted datasets. The first one would cover all measures taken before the surgery, the second the day of the surgery itself + 5 days (but I'll ty to handle this one latter on) and the third one would cover measures taken after the surgery.
Upvotes: 0
Views: 964
Reputation: 2222
The solution to what you are asking is straightforward, because you can in fact filter on dates and compare dates in multiple columns. Please try the code below and confirm for yourself that this works as you would expect. If this approach does not work on your own dataset, please share more about your data and processing because there is probably an error in your code. (One error I already saw: you can't use select(Date < Surgery_date)
. You need to use filter
).
This is how I would approach your problem. As you can see, the code is very straightforward.
df <- data.frame(
Name = c(rep('Pierre', 3), rep('Paul', 3)),
Date = c('2016-03-15', '2017-03-26', '2017-08-09', '2016-07-03', '2016-09-30', '2017-04-12'),
Measurement = c(5.12, 4.16, 5.08, 5.47, 4.98, 4.51),
Surgery_date = c(rep('2017-03-21', 3), rep('2017-03-25', 3))
) %>%
mutate(Surgery_date = ymd(Surgery_date),
Date = ymd(Date))
df %>%
filter(Date < Surgery_date)
df %>%
filter(Date > Surgery_date & Date < (Surgery_date + days(5)))
df %>%
filter(Date > Surgery_date)
Upvotes: 1