Reputation: 1078
I have a data frame with ~150K rows and 77 categorical variables in a form such as the below. How do I found the Score and count for each category
One numeric variable and 77 grouping variables
students<-data.frame(ID = c("A","B","C","D"), Gender = c("M","F","F","F"), Socioeconomic = c("Low","Low","Medium","High"), Subject = c("Maths","Maths","Science", "Science"),
Scores = c(45,98, 50,38))
That is I do not want to have to go through each categorical column individually 77 times but want a tibble that contains a list of the outputs for each of the below
students %>% group_by(Gender) %>% summarise(Mean.score = mean(Scores), Count = length(ID))
students %>% group_by(Socioeconomic) %>% summarise(Mean.score = mean(Scores), Count = length(ID))
students %>% group_by(Subject) %>% summarise(Mean.score = mean(Scores), Count = length(ID))```
Upvotes: 2
Views: 877
Reputation: 93761
Here are two options:
library(tidyverse)
# map successively over each categorical column
map(students %>% select(-Scores, -ID) %>% names() %>% set_names(),
~ students %>%
group_by_at(.x) %>%
summarise(Mean.score = mean(Scores),
Count = n())
)
$Gender # A tibble: 2 x 3 Gender Mean.score Count <fct> <dbl> <int> 1 F 62 3 2 M 45 1 $Socioeconomic # A tibble: 3 x 3 Socioeconomic Mean.score Count <fct> <dbl> <int> 1 High 38 1 2 Low 71.5 2 3 Medium 50 1 $Subject # A tibble: 2 x 3 Subject Mean.score Count <fct> <dbl> <int> 1 Maths 71.5 2 2 Science 44 2
# Convert to long format, group, then summarize
students %>%
gather(key, value, -ID, -Scores) %>%
group_by(key, value) %>%
summarise(Count=n(),
Mean.score=mean(Scores))
key value Count Mean.score <chr> <chr> <int> <dbl> 1 Gender F 3 62 2 Gender M 1 45 3 Socioeconomic High 1 38 4 Socioeconomic Low 2 71.5 5 Socioeconomic Medium 1 50 6 Subject Maths 2 71.5 7 Subject Science 2 44
Upvotes: 2