Reputation: 821

Aggregate function in R using two columns simultaneously

Data:-

df=data.frame(Name=c("John","John","Stacy","Stacy","Kat","Kat"),Year=c(2016,2015,2014,2016,2006,2006),Balance=c(100,150,65,75,150,10))

   Name Year Balance
1  John 2016     100
2  John 2015     150
3 Stacy 2014      65
4 Stacy 2016      75
5   Kat 2006     150
6   Kat 2006      10

Code:-

aggregate(cbind(Year,Balance)~Name,data=df,FUN=max )

Output:-

   Name Year Balance
1  John 2016     150
2   Kat 2006     150
3 Stacy 2016      75

I want to aggregate/summarize the above data frame using two columns which are Year and Balance. I used the base function aggregate to do this. I need the maximum balance of the latest year/ most recent year . The first row in the output , John has the latest year (2016) but the balance of (2015) , which is not what I need, it should output 100 and not 150. where am I going wrong in this?

Upvotes: 3

Answers (3)

xjf

Reputation: 141

Here is another solution without the data.table package.

first sort the data frame,

df <- df[order(-df$Year, -df$Balance),]

then select the first one in each group with the same name

df[!duplicated[df$Name],]

Upvotes: 3

xirururu

Reputation: 5508

I will suggest to use the library dplyr:

data.frame(Name=c("John","John","Stacy","Stacy","Kat","Kat"),
           Year=c(2016,2015,2014,2016,2006,2006),
           Balance=c(100,150,65,75,150,10)) %>% #create the dataframe
    tbl_df() %>% #convert it to dplyr format
    group_by(Name, Year) %>% #group it by Name and Year
    summarise(maxBalance=max(Balance)) %>% # calculate the maximum for each group
    group_by(Name) %>% # group the resulted dataframe by Name
    top_n(1,maxBalance) # return only the first record of each group

Upvotes: 3

eddi

Reputation: 49448

Somewhat ironically, aggregate is a poor tool for aggregating. You could make it work, but I'd instead do:

library(data.table)

setDT(df)[order(-Year, -Balance), .SD[1], by = Name]
#    Name Year Balance
#1:  John 2016     100
#2: Stacy 2016      75
#3:   Kat 2006     150

Upvotes: 7

Aggregate function in R using two columns simultaneously

Answers (3)

Related Questions