Reputation: 183
I would like to delete the last row in a dataframe for each group in R based on max(start_date).
Example data:
id start_date end_date
1 2016-01-14 2016-02-14
1 2016-03-14 2016-08-05
2 2014-01-14 2014-02-14
2 2015-03-21 2015-05-21
2 2015-08-23 2015-09-23
2 2015-11-21 2016-01-03
Result:
id start_date end_date
1 2016-01-14 2016-02-14
2 2014-01-14 2014-02-14
2 2015-03-21 2015-05-21
2 2015-08-23 2015-09-23
The following does not work:
df <- df %>%
group_by(id) %>%
summarise(start_date != max(start_date))
Error: found duplicated column name: id
df <- sqldf("select * from df group by id having start_date != max(start_date)")
error in statement: duplicate column name: id
Any suggestions would be great.
Upvotes: 2
Views: 7434
Reputation: 887048
We can use slice
(assuming that the dates are already ordered)
df1 %>%
group_by(id) %>%
slice(-n())
# id start_date end_date
# <int> <chr> <chr>
#1 1 2016-01-14 2016-02-14
#2 2 2014-01-14 2014-02-14
#3 2 2015-03-21 2015-05-21
#4 2 2015-08-23 2015-09-23
If the dates are not ordered, then arrange
and slice
df1 %>%
group_by(id) %>%
arrange(start_date) %>%
slice(-n())
Based on some previous benchmarks (couldn't find the link), the arrange/slice
method would be faster than comparing start_date != max(start_date)
Upvotes: 10