Creating variables using existing variables in the dataset

Question

I was dealing with creating a variable for a mean score difference between male and female students for each classroom. Class id stands for each classroom. Gender is for each student and the last column is their scores.

I want to have a mean difference value (female(1)-male(0)) for each classroom;

My data looks like this:

data <- matrix(c(1,1,1,1,2,2,2,2,3,3,3,3,
                 0,1,1,0,1,0,0,1,0,1,1,0,
                 20,25,22,21,30,35,32,31,40,45,42,44), 
                 nrow=12, 
                 ncol=3) 
colnames(data) <- c("class id","gender","score")

> data
         class id    gender score
 [1,]        1         0    20
 [2,]        1         1    25
 [3,]        1         1    22
 [4,]        1         0    21
 [5,]        2         1    30
 [6,]        2         0    35
 [7,]        2         0    32
 [8,]        2         1    31
 [9,]        3         0    40
 [10,]        3        1    45
 [11,]        3        1    42
 [12,]        3        0    44

I need it to be something like:

> data
            class id  mean score
 [1,]        1             3
 [2,]        2            -3
 [3,]        3            1.5

Any thoughts?

Thanks!

MrFlick · Accepted Answer

Here's a solution that uses the tidyverse functions

library(tidyverse)
data %>% as_tibble %>% 
  group_by(`class id`, gender) %>% 
  summarize(mean=mean(score)) %>% 
  spread(gender, mean) %>% 
  mutate(mean_score=`1`-`0`) %>% 
  select(`class id`, mean_score)

Working with a tibble or data.frame is much easier than a matrix, so you start by converting your input data. Then we calculate a mean per gender. Then we spread it out to have a value for each gender in the same record for each class. Then we just take the difference. Note the backticks because of the odd column names in this example.

Alternatively you could do something like this

data %>% as_tibble %>%
  group_by(`class id`) %>% 
  summarize(mean_score=mean(score[gender==1]) - mean(score[gender==0]))

which avoids the reshaping.

Creating variables using existing variables in the dataset

Answers (1)

Related Questions