Reputation: 2736
Suppose I write the following R code:
first.value <- sample(100, 100, replace=TRUE)
second.value <- sample(10, 100, replace=TRUE)
X <- data.frame(first.value, second.value)
split.X <- split(X, second.value)
This code creates a data frame with two fields, and splits into bins according to the second. Now suppose I wanted to normalize each bin; i.e., subtract the mean and divide by the standard deviation. I could accomplish this by
normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
But this creates a new list with the normalized versions of each bin. What I really want to do is replace the copy of the data in split.X
with its normalized version.
To illustrate, here's some sample output:
> first.value <- sample(100, 100, replace=TRUE)
> second.value <- sample(10, 100, replace=TRUE)
> X <- data.frame(first.value, second.value)
> split.X <- split(X, second.value)
> normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
> split.X[[1]]
first.value second.value
4 34 1
8 40 1
24 21 1
31 34 1
37 23 1
40 22 1
> normalized.first.value[[1]]
[1] 0.625 1.375 -1.000 0.625 -0.750 -0.875
What I really want to do is to put the values of normalized.first.value[[1]]
into split.X[[1]]$first.value
, and the same for the other indices.
This could be achieved with a for
loop as follows:
for (i in 1:length(split.X)) {
split.X[[i]]$first.value <- (split.X[[i]]$first.value - mean(split.X[[i]]$first.value) / sd(split.X[[i]]$first.value);
}
But for
loops are BAD in R, and I'd like to use sapply
,lapply
, etc. if I can. Unfortunately, when dealing with a list of dataframes, sapply
and lapply
don't seem to iterate in the way I want.
Upvotes: 3
Views: 47
Reputation: 56905
Here's a more arcane way (though I still reckon the for
loop is fine in this case)
new.split.X <- mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
SIMPLIFY=F)
How it works: applies [<-
on each split.X[[i]]
. The T
is the i
index to replace (i.e. all of them), 'first.value'
is the j
index to replace (that column), normalized.first.value
contains the replacements.
The loop may be easier to read in the end though, and probably not slower than tricksy *apply
solutions.
library(rbenchmark)
benchmark(loop={
for (i in 1:length(split.X))
split.X[[i]]$first.value <- normalized.first.value[[i]]
},
mapply={
mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
SIMPLIFY=F)
},
Map={
Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)
},
lapply={
lapply(seq_along(split.X), function(i) {
x1 <- split.X[[i]]
x1[,'first.value'] <- normalized.first.value[[i]]
x1})
})
test replications elapsed relative user.self sys.self user.child sys.child
4 lapply 100 0.034 4.857 0.035 0 0 0
1 loop 100 0.007 1.000 0.007 0 0 0
3 Map 100 0.012 1.714 0.013 0 0 0
2 mapply 100 0.030 4.286 0.032 0 0 0
So the explicit loop is the fastest, and easieset to read anyway.
Upvotes: 2
Reputation: 887028
You can use Map
as both the lists have the same length. It works by replacing the first column in 'split.X' by the corresponding the list
element in 'normalized.first.value'
Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)
Or we can loop through the length of 'split.X', get the list elements of the 'split.X' and 'normalized.first.value' based on the index and then replace.
lapply(seq_along(split.X), function(i) {
x1 <- split.X[[i]]
x1[,'first.value'] <- normalized.first.value[[i]]
x1})
Upvotes: 1