Dplyr giving wrong results

Question

I am using dplyr to summarise a dataset but it's giving wrong result. My code is as bekow :-

Raw_Grp<-Raw_data%>%dplyr::group_by(as.character(Raw_data$Gardu))
                  `%>%dplyr::summarize(Avg=mean(Raw_data$Age))

Below is the str :-

data.frame':    3016 obs. of  2 variables:
 $ Kecamatan: chr  "CENGKARENG" "CENGKARENG" "CENGKARENG" "CENGKARENG" ...
 $ Age      : num  377 370 352 313 299 291 260 223 207 200 ...

Ideally I should get by group values but I am getting the total mean displayed in all the distinct groups. I have searched and tried maximum possibilities like creating a data.table but the same result. If I check the group by in excel or other toll it gives perfect results. Please help

akrun · Accepted Answer

When we use Raw_data$columnname, it extracts the entire column disrupting the group_by condition. So, the syntax would be only the column names of the interested columns

library(dplyr)
Raw_data %>% 
     group_by(Gardu) %>% 
     summarise(Avg = mean(Age))

But, there are cases when we need the entire column. For example, if we wanted to check how many elements of 'Age' within 'Gardu' are less than the whole 'Age' column values

Raw_data %>%
    group_by(Gardu) %>%
    summarise(n = sum(Age < .$Age))

data

Raw_data <- structure(list(Gardu = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
Age = c(21L, 19L, 38L, 31L, 37L, 47L, 21L, 41L, 42L, 20L, 
34L, 25L, 37L, 37L, 23L)), class = "data.frame", row.names = c(NA, 
-15L))

Dplyr giving wrong results

Answers (1)

data

Related Questions