Standardize not among columns, but small parts of columns, using R

Question

I have a multilevel structure, and what I need to do is standardize for each individual (which is the higher level unit, each having several separate measures).

Consider:

  ID measure score
1  1       1     5
2  1       2     7
3  1       3     3
4  2       1    10
5  2       2     5
6  2       3     3
7  3       1     4
8  3       2     1
9  3       3     1

I used apply(data, 2, scale) to standardize for everyone (this also standardizes the ID and measure, but that is alright).

However, how do I make sure to standardize seperately for ID == 1, ID == 2 and ID == 3? --> Each observation - mean of 3 scores, divided by standard deviation for 3 scores).

I was considering a for loop, but the problem is that I want to bootstrap this (in other words, replicate the whole procedure a 1000 times for a big dataset, so speed is VERY important).

Extra information: the ID's can have variable measurements, so it is not the case that they all have 3 measured scores.

The dput of the data is:

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), measure = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), score = c(5L, 7L, 3L, 10L, 5L, 
3L, 4L, 1L, 1L)), .Names = c("ID", "measure", "score"), class = "data.frame", row.names = c(NA, 
-9L))

Jilber Urbina · Accepted Answer

Here's an lapply with split solution and assuming your data is DF

> lapply(split(DF[,-1], DF[,1]), function(x) apply(x, 2, scale))
$`1`
     measure score
[1,]      -1     0
[2,]       0     1
[3,]       1    -1

$`2`
     measure      score
[1,]      -1  1.1094004
[2,]       0 -0.2773501
[3,]       1 -0.8320503

$`3`
     measure      score
[1,]      -1  1.1547005
[2,]       0 -0.5773503
[3,]       1 -0.5773503

An alternative which produces the same result is:

> simplify2array(lapply(split(DF[,-1], DF[,1]), scale))

This alternative avoids using apply inside lapply call.

Here's split divides the data into groups defined by ID and it returns a list, so you can use lapply to loop over each element of the list applying scale.

Using ddply from plyr as @Roland suggests:

> library(plyr)
> ddply(DF, .(ID), numcolwise(scale))
  ID measure      score
1  1      -1  0.0000000
2  1       0  1.0000000
3  1       1 -1.0000000
4  2      -1  1.1094004
5  2       0 -0.2773501
6  2       1 -0.8320503
7  3      -1  1.1547005
8  3       0 -0.5773503
9  3       1 -0.5773503

Importing your data (this is to answer the last comment)

DF <- read.table(text="  ID measure score
1  1       1     5
2  1       2     7
3  1       3     3
4  2       1    10
5  2       2     5
6  2       3     3
7  3       1     4
8  3       2     1
9  3       3     1", header=TRUE)

Standardize not among columns, but small parts of columns, using R

Answers (1)

Related Questions