Reputation: 21461
I have a multilevel structure, and what I need to do is standardize for each individual (which is the higher level unit, each having several separate measures).
Consider:
ID measure score
1 1 1 5
2 1 2 7
3 1 3 3
4 2 1 10
5 2 2 5
6 2 3 3
7 3 1 4
8 3 2 1
9 3 3 1
I used apply(data, 2, scale)
to standardize for everyone (this also standardizes the ID and measure, but that is alright).
However, how do I make sure to standardize seperately for ID == 1
, ID == 2
and ID == 3
?
-->
Each observation
- mean of 3 scores
, divided by standard deviation for 3 scores
).
I was considering a for
loop, but the problem is that I want to bootstrap this (in other words, replicate the whole procedure a 1000 times for a big dataset, so speed is VERY important).
Extra information: the ID's can have variable measurements, so it is not the case that they all have 3 measured scores.
The dput
of the data is:
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), measure = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), score = c(5L, 7L, 3L, 10L, 5L,
3L, 4L, 1L, 1L)), .Names = c("ID", "measure", "score"), class = "data.frame", row.names = c(NA,
-9L))
Upvotes: 1
Views: 129
Reputation: 61214
Here's an lapply
with split
solution and assuming your data is DF
> lapply(split(DF[,-1], DF[,1]), function(x) apply(x, 2, scale))
$`1`
measure score
[1,] -1 0
[2,] 0 1
[3,] 1 -1
$`2`
measure score
[1,] -1 1.1094004
[2,] 0 -0.2773501
[3,] 1 -0.8320503
$`3`
measure score
[1,] -1 1.1547005
[2,] 0 -0.5773503
[3,] 1 -0.5773503
An alternative which produces the same result is:
> simplify2array(lapply(split(DF[,-1], DF[,1]), scale))
This alternative avoids using apply
inside lapply
call.
Here's split
divides the data into groups defined by ID
and it returns a list, so you can use lapply
to loop over each element of the list applying scale
.
Using ddply
from plyr as @Roland suggests:
> library(plyr)
> ddply(DF, .(ID), numcolwise(scale))
ID measure score
1 1 -1 0.0000000
2 1 0 1.0000000
3 1 1 -1.0000000
4 2 -1 1.1094004
5 2 0 -0.2773501
6 2 1 -0.8320503
7 3 -1 1.1547005
8 3 0 -0.5773503
9 3 1 -0.5773503
Importing your data (this is to answer the last comment)
DF <- read.table(text=" ID measure score
1 1 1 5
2 1 2 7
3 1 3 3
4 2 1 10
5 2 2 5
6 2 3 3
7 3 1 4
8 3 2 1
9 3 3 1", header=TRUE)
Upvotes: 3