Dexter1611
Dexter1611

Reputation: 572

Grouping the column with specified logic in R

I have a data frame with 1000's record for many employee and I would like to group the employee based on the Months column such that I can calculate mean for the 1st,2nd,3rd months and if the employee has experienced in months column then group it such that only the 1st experienced value is considered for each employee and then take the mean for those value only for month_1,month_2 and month_3 normal grouping needs to be done and then take the mean for those values.

Please find below the data frame total

name <-c("tom","harry","tom","tom","tom","tom","harry","harry","tom","harry","harry","harry"
         ,"sam","sam","sam","sam","sam","tim","tim")
name<-as.data.frame(name)
months<-c("Month_0","Month_1","Month_1","Month_2","Month_3","Experienced","Month_2","Month_3","Experienced","Experienced","Experienced","Experienced","Month_0","Month_1","Month_2","Month_3","Experienced","Month_2","Experienced")
months<-as.data.frame(months)
values <-c(10,1,20,30,40,50,2,3,60,4,5,6,1,2,3,4,5,100,400)
values<-as.data.frame(values)
total<-cbind(name,months,values)

i tried using dplyr total<-total%>%group_by(months)%>%mutate(average = mean(values) but this takes mean of all the experience values , However I need only the 1st experienced value and then there mean value.

Please find below the input and output snapshot enter image description here

Upvotes: 1

Views: 55

Answers (3)

Daniel O
Daniel O

Reputation: 4358

in Base R

means <- aggregate(values ~ months, total, mean)
means[means$months == "Experienced",2] <- mean(with( total[total$months=="Experienced",], tapply(values,name, function(x) x[1])))

       months     values
1 Experienced 114.750000
2     Month_0   5.500000
3     Month_1   7.666667
4     Month_2  33.750000
5     Month_3  15.666667

Upvotes: 1

Ma&#235;l
Ma&#235;l

Reputation: 51894

Are you looking for something like this ?

require(dplyr)

total %>% 
    group_by(months) %>% 
    filter(!duplicated(name)) %>% 
    summarise(average = mean(values))

# A tibble: 5 x 2
  months      average
  <fct>         <dbl>
1 Experienced  115.  
2 Month_0        5.5 
3 Month_1        7.67
4 Month_2       33.8 
5 Month_3       15.7

!duplicated(name) will by default keep the first value.

Upvotes: 1

Duck
Duck

Reputation: 39585

This can help you:

library(dplyr)

total %>% filter(!duplicated(paste0(name,months,fromLast=T)))%>%
  group_by(months)%>%summarise(average = mean(values))

# A tibble: 5 x 2
  months      average
  <fct>         <dbl>
1 Experienced  115.  
2 Month_0        5.5 
3 Month_1        7.67
4 Month_2       33.8 
5 Month_3       15.7 

Upvotes: 1

Related Questions