How can I modify a particular field in a list of data frames?

Question

Suppose I write the following R code:

first.value <- sample(100, 100, replace=TRUE)
second.value <- sample(10, 100, replace=TRUE)

X <- data.frame(first.value, second.value)
split.X <- split(X, second.value)

This code creates a data frame with two fields, and splits into bins according to the second. Now suppose I wanted to normalize each bin; i.e., subtract the mean and divide by the standard deviation. I could accomplish this by

normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})

But this creates a new list with the normalized versions of each bin. What I really want to do is replace the copy of the data in split.X with its normalized version.

To illustrate, here's some sample output:

> first.value <- sample(100, 100, replace=TRUE)
> second.value <- sample(10, 100, replace=TRUE)
> X <- data.frame(first.value, second.value)
> split.X <- split(X, second.value)
> normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
> split.X[[1]]
   first.value second.value
4           34            1
8           40            1
24          21            1
31          34            1
37          23            1
40          22            1
> normalized.first.value[[1]]
[1]  0.625  1.375 -1.000  0.625 -0.750 -0.875

What I really want to do is to put the values of normalized.first.value[[1]] into split.X[[1]]$first.value, and the same for the other indices.

This could be achieved with a for loop as follows:

for (i in 1:length(split.X)) {
  split.X[[i]]$first.value <- (split.X[[i]]$first.value - mean(split.X[[i]]$first.value) / sd(split.X[[i]]$first.value);
}

But for loops are BAD in R, and I'd like to use sapply,lapply, etc. if I can. Unfortunately, when dealing with a list of dataframes, sapply and lapply don't seem to iterate in the way I want.

akrun · Accepted Answer

You can use Map as both the lists have the same length. It works by replacing the first column in 'split.X' by the corresponding the list element in 'normalized.first.value'

  Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)

Or we can loop through the length of 'split.X', get the list elements of the 'split.X' and 'normalized.first.value' based on the index and then replace.

  lapply(seq_along(split.X), function(i) {
             x1 <- split.X[[i]]
             x1[,'first.value'] <- normalized.first.value[[i]]
             x1})

How can I modify a particular field in a list of data frames?

Answers (2)

Related Questions