Group by, summarize and reshape data frame

Question

I have a raw data as below

raw_data <- data.frame(
                            name=c("Ronak","Bob","Moh"),
                            l_name=c("Shah","Marly","Salah"),
                            R_programming=c(1,5,2),
                            football=c(2,4,6),
                            snooker=c(6,3,2),
                            Python=c(3,2,6),
                            location=c("Maly","US","Maly")
                      )

I want to group by location and then summarize to takE mean of two columns in new column say programming and sport i.e

programing = mean(c(R_programming, python), na.rm=TRUE)
sport= mean(c(football,snooker),na.rm=TRUE)

then I calculate a score and scale based on above averages and reshape data

My expected output is

output <- data.frame(
                        location=c("US","Maly"),
                        indicator=rep(c("programming","sport"),each=2),
                        average=c(3,3.5,3.5,4),
                        score=c(0.6,0.7,0.7,0.8),
                        scale=c("Moderate high","high","high","high")
                    )

Thanks from your help in advance

Ronak Shah · Accepted Answer

group_by location and take average of programming and sports. Get the data in long format and create new columns for score and scale.

library(dplyr)

raw_data %>%
  group_by(location) %>%
  summarise(programming_average = mean(c(R_programming, Python), na.rm = TRUE), 
            sport_average = mean(c(football,snooker),na.rm=TRUE)) %>%
  tidyr::pivot_longer(cols = -location, 
               names_to = c('indicator', '.value'), 
               names_sep = '_') %>%
  mutate(score = average/5, 
         scale = case_when(score <= 0.6 ~ 'moderate high', 
                           score > 0.6 ~ 'high'))

#  location indicator  average score scale        
#                        
#1 Maly     programming     3     0.6 moderate high
#2 Maly     sport          4     0.8 high         
#3 US       programming     3.5   0.7 high         
#4 US       sport          3.5   0.7 high

Group by, summarize and reshape data frame

Answers (1)

Related Questions