Reputation: 1979
I was dealing with creating a variable for a mean score difference between male and female students for each classroom. Class id stands for each classroom. Gender is for each student and the last column is their scores.
I want to have a mean difference value (female(1)-male(0)) for each classroom;
My data looks like this:
data <- matrix(c(1,1,1,1,2,2,2,2,3,3,3,3,
0,1,1,0,1,0,0,1,0,1,1,0,
20,25,22,21,30,35,32,31,40,45,42,44),
nrow=12,
ncol=3)
colnames(data) <- c("class id","gender","score")
> data
class id gender score
[1,] 1 0 20
[2,] 1 1 25
[3,] 1 1 22
[4,] 1 0 21
[5,] 2 1 30
[6,] 2 0 35
[7,] 2 0 32
[8,] 2 1 31
[9,] 3 0 40
[10,] 3 1 45
[11,] 3 1 42
[12,] 3 0 44
I need it to be something like:
> data
class id mean score
[1,] 1 3
[2,] 2 -3
[3,] 3 1.5
Any thoughts?
Thanks!
Upvotes: 0
Views: 50
Reputation: 206566
Here's a solution that uses the tidyverse functions
library(tidyverse)
data %>% as_tibble %>%
group_by(`class id`, gender) %>%
summarize(mean=mean(score)) %>%
spread(gender, mean) %>%
mutate(mean_score=`1`-`0`) %>%
select(`class id`, mean_score)
Working with a tibble or data.frame is much easier than a matrix, so you start by converting your input data. Then we calculate a mean per gender. Then we spread it out to have a value for each gender in the same record for each class. Then we just take the difference. Note the backticks because of the odd column names in this example.
Alternatively you could do something like this
data %>% as_tibble %>%
group_by(`class id`) %>%
summarize(mean_score=mean(score[gender==1]) - mean(score[gender==0]))
which avoids the reshaping.
Upvotes: 1