Reputation: 143
I have data.frames in lists and, normally, when I want to center data, I use a loop (as seen in the example below). I would like to use some function of the "apply" family, but I can not figure out how to write the code.
An example of my data:
env <- list (data.frame(a=c(-1.08, -1.07, -1.07),
b=c( 4.61, 4.59, 4.59),
c=c( 3.46, 3.56, 3.52)),
data.frame(a=c( 3.93, 3.94, 3.92),
b=c(-6.69, -6.72, -6.68),
c=c( 3.04, 3.08, 3.03)))
The values I will use to center them:
d <- c(a=10.20, b=-10.91, c=11.89)
The type of loop that I commonly use:
for(i in 1:length(env)) {
env[[i]][, 1] <- env[[i]][, 1] - d[1]
env[[i]][, 2] <- env[[i]][, 2] - d[2]
env[[i]][, 3] <- env[[i]][, 3] - d[3]
}
Is there a way to use a function of the "apply" family to do the same thing I did in the above loop?
Upvotes: 10
Views: 1038
Reputation: 51
You can also use the map
functions to accomplish the same. Specifically, you can use map()
to loop over the list env
and then map2()
to loop (concurrently) over d
and the individual dataframes, env[[1]]
and env[[2]]
. j-k
is where the data gets centered.
library('purrr')
map(env, function(i){
map2(i, d, function(j,k){
j-k
})
})
yielding,
[[1]]
[[1]]$a [1] -11.28 -11.27 -11.27
[[1]]$b [1] 15.52 15.50 15.50
[[1]]$c [1] -8.43 -8.33 -8.37
[[2]]
[[2]]$a [1] -6.27 -6.26 -6.28
[[2]]$b [1] 4.22 4.19 4.23
[[2]]$c [1] -8.85 -8.81 -8.86
Upvotes: 0
Reputation: 143
Thank you very much for the quick and interesting answers.
I ran all the solutions that you posted, inside the microbenchmark::microbenchmark function.
For the solutions that produce a list of matrices, I added (using only my current knowledge of R) an extra line to transform them into lists of data frames.
env1 <- env
env2 <- env
env3 <- env
env4 <- env
env5 <- env
env6 <- env
env7 <- env
## install.packages
library("microbenchmark")
microbenchmark(
## 1; the original.
for(i in 1:length(env1)) {
env1[[i]][, 1] <- env1[[i]][, 1] - d[1]
env1[[i]][, 2] <- env1[[i]][, 2] - d[2]
env1[[i]][, 3] <- env1[[i]][, 3] - d[3]}
,
## 2
for(i in 1:length(env2)) {
for (j in 1:length(env2[[i]])) {
env2[[i]][, j] <- env2[[i]][, j] - d[j]
}
}
,
## 3
{env3 <- lapply(env3, function(i) t(t(i) - d))
env3 <- lapply(env3, function(i) as.data.frame(i))}
,
## 4
{env4 <- lapply(env4, scale, center=d, scale=FALSE)
env4 <- lapply(env4, function(i) as.data.frame(i))}
,
## 5
{nrows <- 3
env5 <- lapply(env5, function(x) x - matrix(rep(d, nrows), nrow =
nrows, byrow = TRUE))}
,
## 6
env6 <- lapply(env6, sweep, 2, d, "-")
,
## 7
{env7 <- lapply(lapply(lapply(env7, t), "-", d), t)
env7 <- lapply(env7, function(i) as.data.frame(i))}
)
## install.packages("compare")
library("compare")
identical(env1, env2)
identical(env1, env3)
identical(env1, env4)
identical(env1, env5)
identical(env1, env6)
identical(env1, env7)
As you will see, all the lines produce identical objects.
After executing the "microbenchmark" function 5 times, the solution ## 7 in the above code is the faster, although the solution ## 3 just a little slower.
I will study in detail each of the solutions you proposed and, again, thank you very much!
As a token of appreciation, enjoy this song I really like! https://www.youtube.com/watch?v=QnguI5OrfZ4
Greetings!
Upvotes: 0
Reputation: 3905
My version of PoGibas' answer (+1):
lapply(lapply(lapply(env, t), "-", d), t)
It does exactly the same thing:
data.frame
objectsd
using recycling rulesmatrix
objects, which is not what OP wanted.I thought as it uses vectorization more thoroughly it would end being a bit faster. It's not the case though.
microbenchmark(
f1 = lapply(env, function(i) t(t(i) - d)),
f2 = lapply(lapply(lapply(env, t), "-", d), t), times = 1E5L)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# f1 99.838 103.104 114.8280 104.970 108.702 106230.106 1e+05 a
# f2 103.570 107.303 118.9683 110.102 113.834 7765.414 1e+05 b
Upvotes: 0
Reputation: 269624
1) sweep Use sweep
producing a list of data frames:
lapply(env, sweep, 2, d, "-")
giving:
[[1]]
a b c
1 -11.28 15.52 -8.43
2 -11.27 15.50 -8.33
3 -11.27 15.50 -8.37
[[2]]
a b c
1 -6.27 4.22 -8.85
2 -6.26 4.19 -8.81
3 -6.28 4.23 -8.86
Also see How to divide each row of a matrix by elements of a vector in R for numerous expressions that are equivalent or nearly equivalent to sweep
.
2) scale or use scale
like this; however, it gives a list of numeric matrices rather than a list of data frames:
lapply(env, scale, d, FALSE)
giving:
[[1]]
a b c
[1,] -11.28 15.52 -8.43
[2,] -11.27 15.50 -8.33
[3,] -11.27 15.50 -8.37
attr(,"scaled:center")
a b c
10.20 -10.91 11.89
[[2]]
a b c
[1,] -6.27 4.22 -8.85
[2,] -6.26 4.19 -8.81
[3,] -6.28 4.23 -8.86
attr(,"scaled:center")
a b c
10.20 -10.91 11.89
Upvotes: 3
Reputation: 33488
Here is a hack'ey solution using lapply
nrows <- 3
lapply(env, function(x) x - matrix(rep(d, nrows), nrow = nrows, byrow = TRUE))
Upvotes: 1
Reputation: 28339
There are two things you can simplify here: looping over the list elements and subtracting every value in d
separately.
To replace for loop you can use lapply
("l" as we're iterating over the list).
# Run function for every element i in list env
lapply(env, function(i))
To simplify subtraction you can:
t(i)
t(i) - d
t(t(i) - d)
So final code would be:
lapply(env, function(i) t(t(i) - d))
Upvotes: 8