Reputation: 572
I have a data frame with 1000's record for many employee and I would like to group the employee based on the Months column such that I can calculate mean for the 1st,2nd,3rd months and if the employee has experienced in months column then group it such that only the 1st experienced value is considered for each employee and then take the mean for those value only for month_1,month_2 and month_3 normal grouping needs to be done and then take the mean for those values.
Please find below the data frame total
name <-c("tom","harry","tom","tom","tom","tom","harry","harry","tom","harry","harry","harry"
,"sam","sam","sam","sam","sam","tim","tim")
name<-as.data.frame(name)
months<-c("Month_0","Month_1","Month_1","Month_2","Month_3","Experienced","Month_2","Month_3","Experienced","Experienced","Experienced","Experienced","Month_0","Month_1","Month_2","Month_3","Experienced","Month_2","Experienced")
months<-as.data.frame(months)
values <-c(10,1,20,30,40,50,2,3,60,4,5,6,1,2,3,4,5,100,400)
values<-as.data.frame(values)
total<-cbind(name,months,values)
i tried using dplyr
total<-total%>%group_by(months)%>%mutate(average = mean(values)
but this takes mean of all the experience values , However I need only the 1st experienced value and then there mean value.
Please find below the input and output snapshot
Upvotes: 1
Views: 55
Reputation: 4358
in Base R
means <- aggregate(values ~ months, total, mean)
means[means$months == "Experienced",2] <- mean(with( total[total$months=="Experienced",], tapply(values,name, function(x) x[1])))
months values
1 Experienced 114.750000
2 Month_0 5.500000
3 Month_1 7.666667
4 Month_2 33.750000
5 Month_3 15.666667
Upvotes: 1
Reputation: 51894
Are you looking for something like this ?
require(dplyr)
total %>%
group_by(months) %>%
filter(!duplicated(name)) %>%
summarise(average = mean(values))
# A tibble: 5 x 2
months average
<fct> <dbl>
1 Experienced 115.
2 Month_0 5.5
3 Month_1 7.67
4 Month_2 33.8
5 Month_3 15.7
!duplicated(name)
will by default keep the first value.
Upvotes: 1
Reputation: 39585
This can help you:
library(dplyr)
total %>% filter(!duplicated(paste0(name,months,fromLast=T)))%>%
group_by(months)%>%summarise(average = mean(values))
# A tibble: 5 x 2
months average
<fct> <dbl>
1 Experienced 115.
2 Month_0 5.5
3 Month_1 7.67
4 Month_2 33.8
5 Month_3 15.7
Upvotes: 1