Reputation: 8333
Situation
I have a data frame df
:
df <- structure(list(person = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L), .Label = c("pA", "pB", "pC"), class = "factor"), date = structure(c(16071,
16102, 16130, 16161, 16071, 16102, 16130, 16071, 16102), class = "Date")), .Names = c("person",
"date"), row.names = c(NA, -9L), class = "data.frame")
> df
person date
1 pA 2014-01-01
2 pA 2014-02-01
3 pA 2014-03-01
4 pA 2014-04-01
5 pB 2014-01-01
6 pB 2014-02-01
7 pB 2014-03-01
8 pC 2014-01-01
9 pC 2014-02-01
Question
How can I select the last 2 (or 'n') entries, ordered by date, for each person, so that I have a resulting data frame df1
:
> df1
person date
1 pA 2014-03-01
2 pA 2014-04-01
3 pB 2014-02-01
4 pB 2014-03-01
5 pC 2014-01-01
6 pC 2014-02-01
?
I've tried combinations of
library(dplyr)
df1 <- df %>%
group_by(person) %>%
select(tail(df, 2))
with no joy.
Upvotes: 7
Views: 2140
Reputation: 70266
Since you order the data by person and date (i.e. you want the 2 latest dates per person), you can also use top_n()
in dplyr:
df %>% group_by(person) %>% top_n(2, date)
#Source: local data frame [6 x 2]
#Groups: person
#
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
Or, if you already order it, you could arrange it the other way before using slice:
df %>% arrange(person, desc(date)) %>% group_by(person) %>% slice(1:2)
#Source: local data frame [6 x 2]
#Groups: person
#
# person date
#1 pA 2014-04-01
#2 pA 2014-03-01
#3 pB 2014-03-01
#4 pB 2014-02-01
#5 pC 2014-02-01
#6 pC 2014-01-01
See here for a benchmark of a similar question.
Upvotes: 3
Reputation: 118789
Using data.table
:
setDT(df)[order(person), tail(.SD, 2L), by=person]
# person date
# 1: pA 2014-03-01
# 2: pA 2014-04-01
# 3: pB 2014-02-01
# 4: pB 2014-03-01
# 5: pC 2014-01-01
# 6: pC 2014-02-01
We order by person
and then group by person
and select the last two rows from the subset of data .SD
for each group.
Upvotes: 6
Reputation: 887048
You can try slice
library(dplyr)
df %>%
group_by(person) %>%
arrange(date, person) %>%
slice((n()-1):n())
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
Or in place of the last step
do(tail(., 2))
Upvotes: 8