Reputation: 637
I'm trying to select/filter the most recent values in each group in a data frame in R. For example, I would like to select the 3 most recent (i.e. the dates closest to today) from each Name group in the following data frame:
Player Date Result
Sam 03/15/2015 1
Sam 03/22/2015 0
Sam 04/04/2015 2
Sam 04/12/2015 1
Sam 04/18/2015 1
Sam 04/26/2015 0
Sam 08/08/2015 3
Steve 02/17/2015 0
Steve 02/21/2015 0
Steve 03/04/2015 4
Steve 03/11/2015 2
Steve 03/15/2015 1
Steve 03/22/2015 0
Steve 04/12/2015 0
Steve 04/18/2015 2
Steve 04/26/2015 1
Steve 04/29/2015 2
Steve 08/16/2015 4
Jasper 03/15/2015 3
Jasper 03/22/2015 3.5
Jasper 04/04/2015 4
Jasper 04/12/2015 4
Jasper 04/18/2015 5
Jasper 04/26/2015 0
I have written as.date()
code so R now understands the date format but what code can I now use to only select the 3 (say) most recent values from each group?
Upvotes: 2
Views: 7360
Reputation: 887048
We can use dplyr
. We convert the 'Date' to Date
class by using as.Date
. After grouping by 'Player', we arrange
the 'Date' column descendingly and use slice
to get the most recent 3 values. If we don't want to change the 'Date' class, we can remove the mutate
step and do the conversion within the arrange
i.e. arrange(desc(as.Date(Date, '%m/%d/%Y')))
library(dplyr)
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
arrange(desc(Date)) %>%
slice(1:3)
# Player Date Result
#1 Jasper 2015-04-26 0
#2 Jasper 2015-04-18 5
#3 Jasper 2015-04-12 4
#4 Sam 2015-08-08 3
#5 Sam 2015-04-26 0
#6 Sam 2015-04-18 1
#7 Steve 2015-08-16 4
#8 Steve 2015-04-29 2
#9 Steve 2015-04-26 1
Or after we group by the 'Player', we can use top_n
by specifying the 'n' and the 'wt' variable for ordering.
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
top_n(n = 3, Date)
# Player Date Result
#1 Sam 2015-04-18 1
#2 Sam 2015-04-26 0
#3 Sam 2015-08-08 3
#4 Steve 2015-04-26 1
#5 Steve 2015-04-29 2
#6 Steve 2015-08-16 4
#7 Jasper 2015-04-12 4
#8 Jasper 2015-04-18 5
#9 Jasper 2015-04-26 0
Using data.table
, we convert the 'data.frame' to 'data.table' (setDT(df1)
). Grouped by the 'Player, we order
the 'Date' after converting to Date
class, and using the head
we can get the first 3 rows of each group.
library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
# Player Date Result
#1: Steve 08/16/2015 4
#2: Steve 04/29/2015 2
#3: Steve 04/26/2015 1
#4: Sam 08/08/2015 3
#5: Sam 04/26/2015 0
#6: Sam 04/18/2015 1
#7: Jasper 04/26/2015 0
#8: Jasper 04/18/2015 5
#9: Jasper 04/12/2015 4
df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam",
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve",
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper",
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015",
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015",
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015",
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2,
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5,
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA, -24L))
Upvotes: 8