Reputation: 53

More efficient way to perform calculations on multiple (combined) columns by group

What is a more efficient way to perform calculations on multiple combined columns by group?

I have a dataset with Manager Effectiveness & Team Effectiveness components. How can I quickly calculate the number of 5s for each component by gender?

The desired outcome is like so:

Number of 5s for 'Manager effectiveness' = 2

Number of 5s for 'Team effectiveness' = 0

So far, I've tried the dplyr method:

Data %>% group_by(gender) %>% summarise(sum(c(Manager EQ, Manager IQ)) == 5)

Data %>% group_by(gender) %>% summarise(sum(c(Team collaboration, Team friendliness)) == 5)

Though it works, typing each column name quickly becomes tedious and error-prone as more columns are involved.

Upvotes: 1

Answers (2)

mrhellmann

Reputation: 5529

Mostly expanding on @akrun's answer:

## made up data 100 observations
set.seed(133)
dat <- 1:5
gen <- c("M", "F")
z <- tibble(me = sample(dat, 100, TRUE), 
                mi = sample(dat, 100, TRUE),
                tc = sample(dat, 100, TRUE),
                tf = sample(dat, 100, TRUE),
                gender = sample(gen, 100, TRUE))

# Grouping by gender, counting 5's, and reshaping data
z %>% 
  group_by(gender) %>%
  summarise_at(vars(everything()), ~ sum(. == 5)) %>%
  pivot_longer(me:tf) %>%
  mutate(name = paste0("# 5's for ", name)) %>% 
  pivot_wider(gender)

Output:

# A tibble: 2 x 5
  gender `# 5's for me` `# 5's for mi` `# 5's for tc` `# 5's for tf`
  <chr>           <int>          <int>          <int>          <int>
1 F                   6              6              8              5
2 M                  10             14             20              5

This is starting to get a little hack-ey, but in response to Amanda's comment & my misunderstanding of the question:

z %>% 
  group_by(gender) %>%
  summarise_at(vars(everything()), ~ sum(. == 5)) %>%
  pivot_longer(me:tf) %>%
  mutate(name = paste0("# 5's for ", name)) %>%
  mutate(grp = ifelse(str_detect(name, 'm'), 'manager', 'team')) %>% 
  group_by(gender, grp) %>%
  summarise(total_5s = sum(value))

Gives results:

# A tibble: 4 x 3
# Groups:   gender [2]
  gender grp     total_5s
  <chr>  <chr>      <int>
1 F      manager       12
2 F      team          13
3 M      manager       24
4 M      team          25

Unfortunately this relies heavily on making a distinction and group based on the column names of the original data.

Upvotes: 1

akrun

Reputation: 887501

We can use summarise_at

library(dplyr)
Data %>%
   group_by(gender) %>%
   summarise_at(vars(starts_with('Manager')), ~ sum(. == 5))

Or if we are checking the sum of all numeric columns, use summarise_if

Data %>%
   group_by(gender) %>%
   summarise_if(is.numeric, ~ sum(. == 5))

Can we wrapped in a function

f1 <- function(dat, colPrefix, grp, val) {
       dat %>%
          group_by_at(grp) %>%
          summarise_at(vars(starts_with(colPrefix)), ~ sum(. == val))
  }

f1(Data, "Manager", "gender", 5)

Upvotes: 3

More efficient way to perform calculations on multiple (combined) columns by group

Answers (2)

Related Questions