Will T-E
Will T-E

Reputation: 637

How to select 'x' most recent values in each group in R?

I'm trying to select/filter the most recent values in each group in a data frame in R. For example, I would like to select the 3 most recent (i.e. the dates closest to today) from each Name group in the following data frame:

Player  Date    Result
 Sam    03/15/2015  1
 Sam    03/22/2015  0
 Sam    04/04/2015  2
 Sam    04/12/2015  1
 Sam    04/18/2015  1
 Sam    04/26/2015  0
 Sam    08/08/2015  3
Steve   02/17/2015  0
Steve   02/21/2015  0
Steve   03/04/2015  4
Steve   03/11/2015  2
Steve   03/15/2015  1
Steve   03/22/2015  0
Steve   04/12/2015  0
Steve   04/18/2015  2
Steve   04/26/2015  1
Steve   04/29/2015  2
Steve   08/16/2015  4
Jasper  03/15/2015  3
Jasper  03/22/2015  3.5
Jasper  04/04/2015  4
Jasper  04/12/2015  4
Jasper  04/18/2015  5
Jasper  04/26/2015  0

I have written as.date() code so R now understands the date format but what code can I now use to only select the 3 (say) most recent values from each group?

Upvotes: 2

Views: 7360

Answers (1)

akrun
akrun

Reputation: 887048

We can use dplyr. We convert the 'Date' to Date class by using as.Date. After grouping by 'Player', we arrange the 'Date' column descendingly and use slice to get the most recent 3 values. If we don't want to change the 'Date' class, we can remove the mutate step and do the conversion within the arrange i.e. arrange(desc(as.Date(Date, '%m/%d/%Y')))

library(dplyr)
df1 %>%
    mutate(Date=as.Date(Date, '%m/%d/%Y')) %>% 
    group_by(Player) %>% 
    arrange(desc(Date)) %>% 
    slice(1:3)
#    Player       Date Result
#1 Jasper 2015-04-26      0
#2 Jasper 2015-04-18      5
#3 Jasper 2015-04-12      4
#4    Sam 2015-08-08      3
#5    Sam 2015-04-26      0
#6    Sam 2015-04-18      1
#7  Steve 2015-08-16      4
#8  Steve 2015-04-29      2
#9  Steve 2015-04-26      1

Or after we group by the 'Player', we can use top_n by specifying the 'n' and the 'wt' variable for ordering.

 df1 %>% 
   mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
   group_by(Player)  %>%
   top_n(n = 3, Date)
#  Player       Date Result
#1    Sam 2015-04-18      1
#2    Sam 2015-04-26      0
#3    Sam 2015-08-08      3
#4  Steve 2015-04-26      1
#5  Steve 2015-04-29      2
#6  Steve 2015-08-16      4
#7 Jasper 2015-04-12      4
#8 Jasper 2015-04-18      5
#9 Jasper 2015-04-26      0

Using data.table, we convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by the 'Player, we order the 'Date' after converting to Date class, and using the head we can get the first 3 rows of each group.

library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
#   Player       Date Result
#1:  Steve 08/16/2015      4
#2:  Steve 04/29/2015      2
#3:  Steve 04/26/2015      1
#4:    Sam 08/08/2015      3
#5:    Sam 04/26/2015      0
#6:    Sam 04/18/2015      1
#7: Jasper 04/26/2015      0
#8: Jasper 04/18/2015      5
#9: Jasper 04/12/2015      4

data

df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam", 
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve", 
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper", 
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015", 
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015", 
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015", 
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015", 
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015", 
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2, 
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5, 
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA,  -24L))

Upvotes: 8

Related Questions