user3594490
user3594490

Reputation: 1999

Summary statistic by levels of two separate categorical variables

I have created the following dataframe

Group <- c('A','A','A','B','B','B','B','C','C','C')
YearWeek <-c('201401','201401','201401','201401','201401','201401','201401','201401','201401','201401')
Score1 <- c(404,440,395,500,450,476,350,500,600,575)
Group <- c('A','A','A','B','B','B','B','C','C','C','A','A','A','B','B','B','B','C','C','C')
YearWeek <-c('201401','201401','201401','201401','201401','201401','201401','201401','201401','201401','201402','201402','201402','201402','201402','201402','201402','201402','201402','201402')
Score1 <-c(404,440,395,500,450,476,350,500,600,575,460,445,400,508,470,422,368,555,700,634)
employee <- c(1:20)
employ.data <- data.frame(employee, Group, YearWeek, Score1)

I want to calculate the mean of group 'A' (my control group) by each level of 'YearWeek' and subtract it from Score1 for every employee (including the control group employees) according to the same YearWeek and add the result to the dataframe as a new variable 'Difference'

I tried first to obtain the mean for group 'A' (control group employees) but received the following error:

CTRLScore <- as.data.frame(employ.data[, j=list(mean(Score1),by = list(YearWeek,Group,"A"))]) 
Error in .subset(x, j) : invalid subscript type 'list'

In addition: Warning message:

In `[.data.frame`(employ.data, , j = list(mean(Score1), by = list(YearWeek,  :
named arguments other than 'drop' are discouraged

Upvotes: 1

Views: 662

Answers (3)

Jaap
Jaap

Reputation: 83255

A dplyr variation on @MrFlick's answer:

# calculating the means
ctrlmeans <- with(subset(employ.data, Group=="A"), tapply(Score1, YearWeek, mean))

# adding the difference to the data.frame
require(dplyr)
employ.data <- employ.data %.%
  mutate(Difference = Score1 - ctrlmeans[employ.data$YearWeek])

Upvotes: 0

salemmarafi
salemmarafi

Reputation: 230

This seems to work for me:

library(reshape)
melted<-melt(employ.data)
casted<-cast(x,formula=Group+YearWeek~variable,subset=variable=="Score1",fun.aggregate=mean)

#Print Out 
casted

# Holder variables
addColumn <- NULL
i<-0

for(i in 1:nrow(employ.data))
{
  score <- employ.data[i,]$Score1
  group<-employ.data[i,]$Group
  yearWeek <- employ.data[i,]$YearWeek
  sub<-casted[casted$Group %in% group,]
  meanScore<-sub[sub$YearWeek %in% yearWeek,]$Score1
  addColumn <- c(addColumn,score-meanScore)
}

# Combine
cbind(employ.data,addColumn)

Upvotes: 0

MrFlick
MrFlick

Reputation: 206411

Here's a strategy that I believe will work.

First calculate the mean for group A for each YearWeek

ctrlmeans <- with(subset(employ.data, Group=="A"), tapply(Score1, YearWeek, mean))

That returns a named vector. We can then use the YearWeek column of the data.frame as a look up into that table to subtract off the mean. We can do that with

Difference <- employ.data$Score1-ctrlmeans[employ.data$YearWeek]

and then add that back to the data.frame

employ.data$Difference <- Difference

Upvotes: 2

Related Questions