Soren Christensen
Soren Christensen

Reputation: 376

How to aggregate in R with conditions

x <- data.frame(names=c("peter","peter", "jacob", "jacob"),
            some_score =c(5,8,6,8),
            xDate = as.Date(c("2018-01-01", "2019-01-01" , "2018-06-08", 
"2019-10-22"))

)

In this dataframe in need to aggregate so i get the latest score for each name (peter= 8 and Jacob=8)

Is there a fast way to do that? Right now i am creating two aggregate dataframes and linking them. But this seems inefficient

Upvotes: 1

Views: 62

Answers (5)

hello_friend
hello_friend

Reputation: 5788

Base R one liner:

data.frame(do.call("rbind", lapply(split(x, x$names), function(x){x[which.max(x$xDate),]})), 
           row.names = NULL)

Upvotes: 1

Marcos Augusto
Marcos Augusto

Reputation: 51

the dplyr package is great option for your question.

x <- data.frame(names=c("peter","peter", "jacob", "jacob"),
                some_score =c(5,8,6,8),
                xDate = as.Date(c("2018-01-01", "2019-01-01" , "2018-06-08", 
                                  "2019-10-22")))
library(dplyr)
x %>% 
  group_by(names) %>% 
  summarise(max_some_score = max(some_score))

Upvotes: 0

Florian
Florian

Reputation: 1258

Another solution:

library(magrittr)

x %>% 
  dplyr::group_by(names) %>% 
  dplyr::filter(xDate == max(xDate))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

We could get the row with max xDate for each names.

This can be done using dplyr

library(dplyr)
x %>% group_by(names) %>% slice(which.max(xDate))

#  names  some_score  Date     
#   <chr>      <dbl> <date>    
#1 jacob          8 2019-10-22
#2 peter          8 2019-01-01

Or data.table

library(data.table)
setDT(x)[, .SD[which.max(xDate)], names]

Upvotes: 3

user2974951
user2974951

Reputation: 10375

> aggregate(x,list(x$names),tail,1)
  Group.1 names some_score      xDate
1   jacob jacob          8 2019-10-22
2   peter peter          8 2019-01-01

assuming your dataframe is sorted, otherwise sort it first by time.

Upvotes: 2

Related Questions