Reputation: 724
I have a data frame ordered desc on the calving column as below.
Served Calved ProfileID
1 2015-07-29 2017-05-07 1346
2 2015-07-29 2017-05-06 2645
3 2016-06-12 2017-05-05 3687
4 2016-05-19 2017-05-05 3687
5 2015-05-21 2017-05-05 3687
6 2013-05-08 2017-05-05 3687
7 2015-08-08 2016-05-04 4235
8 2015-06-14 2016-05-04 4235
9 2015-05-31 2016-05-04 4235
10 2013-08-13 2014-05-02 5425
11 2013-07-23 2014-05-02 5425
12 2012-03-01 2014-05-02 5425
13 2017-07-11 2013-04-22 5425
14 2012-11-01 2013-04-22 5425
15 2015-12-23 2013-04-22 5425
16 2014-05-10 2013-04-22 5425
I would like to remove the duplicates from the Calved column keeping one observance from the ProfileID column depending on the date in the calved column like so
Served Calved ProfileID
1 2015-07-29 2017-05-07 1346
2 2015-07-29 2017-05-06 2645
3 2016-06-12 2017-05-05 3687
7 2015-08-08 2016-05-04 4235
10 2013-08-13 2014-05-02 5425
13 2017-07-11 2013-04-22 5425
I achieved this using
on_served_profileID<-master_arranged[!duplicated(master_arranged[c("Calved","ProfileID")]),]
I would like to add and and condition so that the row which is selected from Calved column is less than the served column and not just the first occurrence in each date.
For line 13 of the output I would rather this be line 14 because the served column is less than the calved column like so, rather than giving me the first observance for each date in the Calved column.
Served Calved ProfileID
1 2015-07-29 2017-05-07 1346
2 2015-07-29 2017-05-06 2645
3 2016-06-12 2017-05-05 3687
7 2015-08-08 2016-05-04 4235
10 2015-08-13 2014-05-02 5425
14 2012-11-01 2013-04-22 5425
I have tried, and variations of:
on_served_profileID<-master_arranged[!duplicated(master_arranged[c("Calved","ProfileID")])& master_arranged$Served < master_arranged$Calved,]
This is to try and select the calved observance which is less than the served observance hence the & condition "$served < $Calved"
Upvotes: 0
Views: 79
Reputation: 11955
library(dplyr)
df$Served <- as.Date(df$Served)
df$Calved <- as.Date(df$Calved)
df %>%
group_by(Calved, ProfileID) %>%
summarise(Served = Served[first(which(Served < Calved))]) %>%
arrange(desc(Calved))
Output is:
Calved ProfileID Served
1 2017-05-07 1346 2015-07-29
2 2017-05-06 2645 2015-07-29
3 2017-05-05 3687 2016-06-12
4 2016-05-04 4235 2015-08-08
5 2014-05-02 5425 2013-08-13
6 2013-04-22 5425 2012-11-01
Sample data:
df <- structure(list(Served = c("2015-07-29", "2015-07-29", "2016-06-12",
"2016-05-19", "2015-05-21", "2013-05-08", "2015-08-08", "2015-06-14",
"2015-05-31", "2013-08-13", "2013-07-23", "2012-03-01", "2017-07-11",
"2012-11-01", "2015-12-23", "2014-05-10"), Calved = c("2017-05-07",
"2017-05-06", "2017-05-05", "2017-05-05", "2017-05-05", "2017-05-05",
"2016-05-04", "2016-05-04", "2016-05-04", "2014-05-02", "2014-05-02",
"2014-05-02", "2013-04-22", "2013-04-22", "2013-04-22", "2013-04-22"
), ProfileID = c(1346L, 2645L, 3687L, 3687L, 3687L, 3687L, 4235L,
4235L, 4235L, 5425L, 5425L, 5425L, 5425L, 5425L, 5425L, 5425L
)), .Names = c("Served", "Calved", "ProfileID"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16"))
Upvotes: 1