Nicso
Nicso

Reputation: 143

Replace loop with one of the functions of the "apply" family

I have data.frames in lists and, normally, when I want to center data, I use a loop (as seen in the example below). I would like to use some function of the "apply" family, but I can not figure out how to write the code.

An example of my data:

env <- list (data.frame(a=c(-1.08, -1.07, -1.07),
                        b=c( 4.61,  4.59,  4.59),
                        c=c( 3.46,  3.56,  3.52)),
             data.frame(a=c( 3.93,  3.94,  3.92),
                        b=c(-6.69, -6.72, -6.68),
                        c=c( 3.04,  3.08,  3.03)))

The values I will use to center them:

d <- c(a=10.20, b=-10.91, c=11.89)

The type of loop that I commonly use:

for(i in 1:length(env)) {
    env[[i]][, 1] <- env[[i]][, 1] - d[1]
    env[[i]][, 2] <- env[[i]][, 2] - d[2]
    env[[i]][, 3] <- env[[i]][, 3] - d[3]
}

Is there a way to use a function of the "apply" family to do the same thing I did in the above loop?

Upvotes: 10

Views: 1038

Answers (6)

Japie
Japie

Reputation: 51

You can also use the map functions to accomplish the same. Specifically, you can use map() to loop over the list env and then map2() to loop (concurrently) over d and the individual dataframes, env[[1]] and env[[2]]. j-k is where the data gets centered.

library('purrr')
map(env, function(i){
  map2(i, d, function(j,k){
    j-k
  })
})

yielding,

[[1]]
[[1]]$a [1] -11.28 -11.27 -11.27
[[1]]$b [1] 15.52 15.50 15.50
[[1]]$c [1] -8.43 -8.33 -8.37

[[2]] 
[[2]]$a [1] -6.27 -6.26 -6.28
[[2]]$b [1] 4.22 4.19 4.23
[[2]]$c [1] -8.85 -8.81 -8.86

Upvotes: 0

Nicso
Nicso

Reputation: 143

Thank you very much for the quick and interesting answers.

I ran all the solutions that you posted, inside the microbenchmark::microbenchmark function.

For the solutions that produce a list of matrices, I added (using only my current knowledge of R) an extra line to transform them into lists of data frames.

env1 <- env
env2 <- env
env3 <- env
env4 <- env
env5 <- env
env6 <- env
env7 <- env

## install.packages
library("microbenchmark")
microbenchmark(
## 1; the original.
for(i in 1:length(env1)) {
    env1[[i]][, 1] <- env1[[i]][, 1] - d[1]
    env1[[i]][, 2] <- env1[[i]][, 2] - d[2]
    env1[[i]][, 3] <- env1[[i]][, 3] - d[3]}
,

## 2
for(i in 1:length(env2)) {
    for (j in 1:length(env2[[i]])) {
        env2[[i]][, j] <- env2[[i]][, j] - d[j]
    }
}
,

## 3
{env3 <- lapply(env3, function(i) t(t(i) - d))
env3 <- lapply(env3, function(i) as.data.frame(i))}
,

## 4
{env4 <- lapply(env4, scale, center=d, scale=FALSE)
env4 <- lapply(env4, function(i) as.data.frame(i))}
,

## 5
{nrows <- 3
env5 <- lapply(env5, function(x) x - matrix(rep(d, nrows), nrow = 
nrows, byrow = TRUE))}
,

## 6
env6 <- lapply(env6, sweep, 2, d, "-")
,

## 7
{env7 <- lapply(lapply(lapply(env7, t), "-", d), t)
env7 <- lapply(env7, function(i) as.data.frame(i))}
)

## install.packages("compare")
library("compare")
identical(env1, env2)
identical(env1, env3)
identical(env1, env4)
identical(env1, env5)
identical(env1, env6)
identical(env1, env7)

As you will see, all the lines produce identical objects.

After executing the "microbenchmark" function 5 times, the solution ## 7 in the above code is the faster, although the solution ## 3 just a little slower.

I will study in detail each of the solutions you proposed and, again, thank you very much!

As a token of appreciation, enjoy this song I really like! https://www.youtube.com/watch?v=QnguI5OrfZ4

Greetings!

Upvotes: 0

catastrophic-failure
catastrophic-failure

Reputation: 3905

My version of PoGibas' answer (+1):

lapply(lapply(lapply(env, t), "-", d), t)

It does exactly the same thing:

  • Transpose the data.frame objects
  • Subtract d using recycling rules
  • Transpose them back to the original position
  • It returns matrix objects, which is not what OP wanted.

I thought as it uses vectorization more thoroughly it would end being a bit faster. It's not the case though.

microbenchmark(
  f1 = lapply(env, function(i) t(t(i) - d)),
  f2 = lapply(lapply(lapply(env, t), "-", d), t), times = 1E5L)
#Unit: microseconds
# expr     min      lq     mean  median      uq        max neval cld
#   f1  99.838 103.104 114.8280 104.970 108.702 106230.106 1e+05  a 
#   f2 103.570 107.303 118.9683 110.102 113.834   7765.414 1e+05   b

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269624

1) sweep Use sweep producing a list of data frames:

lapply(env, sweep, 2, d, "-")

giving:

[[1]]
       a     b     c
1 -11.28 15.52 -8.43
2 -11.27 15.50 -8.33
3 -11.27 15.50 -8.37

[[2]]
      a    b     c
1 -6.27 4.22 -8.85
2 -6.26 4.19 -8.81
3 -6.28 4.23 -8.86

Also see How to divide each row of a matrix by elements of a vector in R for numerous expressions that are equivalent or nearly equivalent to sweep.

2) scale or use scale like this; however, it gives a list of numeric matrices rather than a list of data frames:

lapply(env, scale, d, FALSE)

giving:

[[1]]
          a     b     c
[1,] -11.28 15.52 -8.43
[2,] -11.27 15.50 -8.33
[3,] -11.27 15.50 -8.37
attr(,"scaled:center")
     a      b      c 
 10.20 -10.91  11.89 

[[2]]
         a    b     c
[1,] -6.27 4.22 -8.85
[2,] -6.26 4.19 -8.81
[3,] -6.28 4.23 -8.86
attr(,"scaled:center")
     a      b      c 
 10.20 -10.91  11.89 

Upvotes: 3

s_baldur
s_baldur

Reputation: 33488

Here is a hack'ey solution using lapply

nrows <- 3
lapply(env, function(x) x - matrix(rep(d, nrows), nrow = nrows, byrow = TRUE))

Upvotes: 1

pogibas
pogibas

Reputation: 28339

There are two things you can simplify here: looping over the list elements and subtracting every value in d separately.

To replace for loop you can use lapply ("l" as we're iterating over the list).

# Run function for every element i in list env
lapply(env, function(i))

To simplify subtraction you can:

  1. Transpose dataframe t(i)
  2. Perform subtraction t(i) - d
  3. Transpose it back t(t(i) - d)

So final code would be:

lapply(env, function(i) t(t(i) - d))

Upvotes: 8

Related Questions