Reputation: 93
I hope it's not a duplicate, but I searched hard and didn't find the answer.
So, I have a big data.table (>50000 observations), here's the head:
measure condition subject channel score
1: LZs dark 03 1 0.5589379
2: LZs dark 03 2 0.5225509
3: LZs dark 03 3 0.5988951
4: LZs dark 03 4 0.5475331
5: LZs dark 03 5 0.5468930
6: LZs dark 03 6 0.5431141
I want to create a new column such as
data$diff = data$score - data$score[data$condition%in%"dark"]
I have 9 different measures, 5 conditions, 18 subjects and 64 channels - thus I can't check line by line if I get the expected result. Still, with a random check in the data I found out it wasn't the case.
How to be SURE that this simple operation is done using the score of the right measure, subject and channel each time?
Of course, I could do several for
loops, but that's not nice R code. I assume it could be done using dplyr
, but I'm not familiar with it and a simple mutate()
didn't work better.
Upvotes: 2
Views: 205
Reputation: 887048
Assuming that we need to get the difference for each 'measure' and 'subject', specify the 'measure' and 'subject' in the by
, subtract 'score' from those elements where 'condition' is 'dark' (the length is assumed to be same)
library(data.table)
data[, Diff := score - score[condition =="dark"], .(measure, subject)]
Upvotes: 2