Reputation: 950
I want to extract elements from a list based on indices stored in a separate vector.
This is my attempt at it:
list_positions<-c(2,3,4)
my_list<-list(c(1,3,4),c(2,3,4,5,6),c(1,2,3,4,6))
my_fun<-function(x,y){
x[y]
}
mapply(my_fun,x=my_list,y=list_positions)
Maybe somebody can suggest a faster solution. My list is has around 14 million elements. I tried parallel solutions, where instead of mapply I used clusterMap but still I would like to have a better performance.
Upvotes: 3
Views: 1946
Reputation: 887951
We may unlist
the list
, create index based on lengths
of 'my_list' and extract the vector
v1 <- unlist(my_list)
p1 <- list_positions
v1[cumsum(lengths(my_list))- (lengths(my_list)-p1)]
#[1] 3 4 4
set.seed(24)
lst <- lapply(1:1e6, function(i) sample(1:10, sample(2:5), replace=FALSE))
p2 <- sapply(lst, function(x) sample(length(x), 1))
system.time({
r1 <- mapply(`[`, lst, p2)
})
#user system elapsed
# 1.84 0.02 1.86
system.time( r4 <- mapply(my_fun, lst, p2) )
# user system elapsed
# 1.88 0.01 1.89
system.time({ r4 <- mapply(my_fun, lst, p2) }) #placing inside the {}
# user system elapsed
# 2.31 0.00 2.31
system.time({ ##cccmir's function
r3 <- mapply(my_func1, lst, p2)
})
# user system elapsed
# 12.10 0.03 12.13
system.time({
v2 <- unlist(lst)
r2 <- v2[cumsum(lengths(lst))- (lengths(lst)-p2)]
})
# user system elapsed
# 0.14 0.00 0.14
identical(r1, r2)
#[1] TRUE
Upvotes: 3
Reputation: 1003
you should use a for loop in this case, for example:
library(microbenchmark)
list_positions<-c(2,3,4)
my_list<-list(c(1,3,4),c(2,3,4,5,6),c(1,2,3,4,6))
my_fun<-function(x,y){
x[y]
}
mapply(my_fun,x=my_list,y=list_positions)
my_func1 <- function(aList, positions){
res <- numeric(length(aList))
for(i in seq_along(aList)) {
res[i] <- aList[[i]][positions[i]]
}
return(res)
}
my_func2 <- function(aList, positions) {
v1 <- unlist(aList)
p1 <- positions
v1[cumsum(lengths(my_list))- (lengths(my_list)-p1)]
}
microbenchmark(mapply(my_fun,x=my_list,y=list_positions), my_func1(my_list, list_positions), my_func2(my_list, list_positions), times = 1000)
#Unit: microseconds
# expr min lq mean median uq max neval
#mapply(my_fun, x = my_list, y = list_positions) 12.764 13.858 17.453172 14.588 16.775 119.613 1000
# my_func1(my_list, list_positions) 5.106 5.835 7.328412 6.200 6.929 38.292 1000
# my_func2(my_list, list_positions) 2.553 3.282 4.337367 3.283 3.648 52.514 1000
@akrun solution is the fastest
Upvotes: 2