John Gowers
John Gowers

Reputation: 2736

How can I modify a particular field in a list of data frames?

Suppose I write the following R code:

first.value <- sample(100, 100, replace=TRUE)
second.value <- sample(10, 100, replace=TRUE)

X <- data.frame(first.value, second.value)
split.X <- split(X, second.value)

This code creates a data frame with two fields, and splits into bins according to the second. Now suppose I wanted to normalize each bin; i.e., subtract the mean and divide by the standard deviation. I could accomplish this by

normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})

But this creates a new list with the normalized versions of each bin. What I really want to do is replace the copy of the data in split.X with its normalized version.

To illustrate, here's some sample output:

> first.value <- sample(100, 100, replace=TRUE)
> second.value <- sample(10, 100, replace=TRUE)
> X <- data.frame(first.value, second.value)
> split.X <- split(X, second.value)
> normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
> split.X[[1]]
   first.value second.value
4           34            1
8           40            1
24          21            1
31          34            1
37          23            1
40          22            1
> normalized.first.value[[1]]
[1]  0.625  1.375 -1.000  0.625 -0.750 -0.875

What I really want to do is to put the values of normalized.first.value[[1]] into split.X[[1]]$first.value, and the same for the other indices.

This could be achieved with a for loop as follows:

for (i in 1:length(split.X)) {
  split.X[[i]]$first.value <- (split.X[[i]]$first.value - mean(split.X[[i]]$first.value) / sd(split.X[[i]]$first.value);
}

But for loops are BAD in R, and I'd like to use sapply,lapply, etc. if I can. Unfortunately, when dealing with a list of dataframes, sapply and lapply don't seem to iterate in the way I want.

Upvotes: 3

Views: 47

Answers (2)

mathematical.coffee
mathematical.coffee

Reputation: 56905

Here's a more arcane way (though I still reckon the for loop is fine in this case)

new.split.X <- mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
                      SIMPLIFY=F) 

How it works: applies [<- on each split.X[[i]]. The T is the i index to replace (i.e. all of them), 'first.value' is the j index to replace (that column), normalized.first.value contains the replacements.

The loop may be easier to read in the end though, and probably not slower than tricksy *apply solutions.

library(rbenchmark)
benchmark(loop={
    for (i in 1:length(split.X))
        split.X[[i]]$first.value <- normalized.first.value[[i]]
  },
  mapply={
    mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
                          SIMPLIFY=F)
  },
  Map={
    Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)
  },
  lapply={
    lapply(seq_along(split.X), function(i) {
             x1 <- split.X[[i]]
             x1[,'first.value'] <- normalized.first.value[[i]]
             x1})
  })
    test replications elapsed relative user.self sys.self user.child sys.child
4 lapply          100   0.034    4.857     0.035        0          0         0
1   loop          100   0.007    1.000     0.007        0          0         0
3    Map          100   0.012    1.714     0.013        0          0         0
2 mapply          100   0.030    4.286     0.032        0          0         0

So the explicit loop is the fastest, and easieset to read anyway.

Upvotes: 2

akrun
akrun

Reputation: 887028

You can use Map as both the lists have the same length. It works by replacing the first column in 'split.X' by the corresponding the list element in 'normalized.first.value'

  Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)

Or we can loop through the length of 'split.X', get the list elements of the 'split.X' and 'normalized.first.value' based on the index and then replace.

  lapply(seq_along(split.X), function(i) {
             x1 <- split.X[[i]]
             x1[,'first.value'] <- normalized.first.value[[i]]
             x1})

Upvotes: 1

Related Questions