monkeyshines
monkeyshines

Reputation: 1078

Allocating value to variable if a condition is met in data.table

I have a big data set with the following variables

student_ID=c(rep("1001",8),rep("1002",3),rep("1003",11))
grades=c(NA,rep(40,2),50,60,90, 5,NA,51, rep(47,5),rep(70,5),rep(42,3))
Year=c(rep(2011,4),rep(2012,4),2011,2012,2013,rep(2011,4),rep(2012,3),rep(2013,4))
data<-data.table(student_ID,grades,Year)
setkey(data, student_ID)

I need to create two new variables. One for average grade by student. One for whether the average grade is <50 (1 if yes, 0 if no) for any given year.

Once this is done I will be looking at the subset at the student and year level.

Upvotes: 0

Views: 99

Answers (1)

akrun
akrun

Reputation: 887213

For creating two columns grouped by 'student_ID', we need to assign ( :=) the output (mean(grades) and the binary output) to new column names.

data[, c('MeanGrade', 'MeanGradelessthan50') :={tmp <- mean(grades, na.rm=TRUE)
                                  list(tmp, +(tmp < 50))}, by = student_ID]

If we need to summarise, instead of assigning, use list

data[, {tmp <- mean(grades, na.rm=TRUE)
     list(MeanGrade=tmp, MeanGradelessthan50 = +(tmp < 50))}, by = .(student_ID, Year)] 

Upvotes: 2

Related Questions