Reputation: 25
I am using dplyr to summarise a dataset but it's giving wrong result. My code is as bekow :-
Raw_Grp<-Raw_data%>%dplyr::group_by(as.character(Raw_data$Gardu))
`%>%dplyr::summarize(Avg=mean(Raw_data$Age))
Below is the str :-
data.frame': 3016 obs. of 2 variables:
$ Kecamatan: chr "CENGKARENG" "CENGKARENG" "CENGKARENG" "CENGKARENG" ...
$ Age : num 377 370 352 313 299 291 260 223 207 200 ...
Ideally I should get by group values but I am getting the total mean displayed in all the distinct groups. I have searched and tried maximum possibilities like creating a data.table but the same result. If I check the group by in excel or other toll it gives perfect results. Please help
Upvotes: 2
Views: 1207
Reputation: 887851
When we use Raw_data$columnname
, it extracts the entire column disrupting the group_by
condition. So, the syntax would be only the column names of the interested columns
library(dplyr)
Raw_data %>%
group_by(Gardu) %>%
summarise(Avg = mean(Age))
But, there are cases when we need the entire column. For example, if we wanted to check how many elements of 'Age' within 'Gardu' are less than the whole 'Age' column values
Raw_data %>%
group_by(Gardu) %>%
summarise(n = sum(Age < .$Age))
Raw_data <- structure(list(Gardu = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
Age = c(21L, 19L, 38L, 31L, 37L, 47L, 21L, 41L, 42L, 20L,
34L, 25L, 37L, 37L, 23L)), class = "data.frame", row.names = c(NA,
-15L))
Upvotes: 1