Reputation: 2638
I would like to find an efficient operation to do the following look up in a list:
L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]
I think for
loops are inefficient and I imagine this can be done faster using, for example, sapply
. My main goal is to do this efficiently when L
is long.
Upvotes: 1
Views: 138
Reputation: 16981
UPDATE:
Your aversion to a for
loop may be unfounded. I've found that it can be very machine dependent. On my current machine, with b
properly initialized, a base R for
loop is slower only than an Rcpp
solution, and that just barely. See the updated benchmark below. The loop1
solution is properly initialized. However, I've tried this on other machines, and on some the for
loops are indeed slower than the apply
solutions.
A base R vectorized solution using unlist
, cumsum
, and lengths
:
b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]
Benchmarking (tossing in an Rcpp
solution)*
library(purrr)
L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))
Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
rcpp = ListIndex(L, a),
loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
check = "identical")
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465 100
#> vapply 97.8447 107.33390 116.41775 112.33445 119.01680 189.9191 100
#> purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446 100
#> unlist 29.4186 29.97935 32.05529 30.86130 33.02160 44.6751 100
#> rcpp 22.3468 22.78460 25.47667 23.48495 26.63935 37.2362 100
#> loop1 25.5240 27.34865 28.94650 28.02920 29.32110 42.9779 100
#> loop2 41.4726 46.04130 52.58843 51.00240 56.54375 88.3444 100
*I couldn't get akrun's dplyr
solution to work with the larger vector.
Upvotes: 2
Reputation: 887291
We could use
library(dplyr)
stack(setNames(L, a)) %>%
group_by(ind) %>%
summarise(out = values[[as.numeric(as.character(first(ind)))]]) %>%
pull(out)
[1] 12 17
Or in base R
using vapply
which would be faster
vapply(seq_along(L), \(i) L[[i]][a[i]], numeric(1))
[1] 12 17
or use imap
as a compact option
library(purrr)
imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])
3 7
12 17
Upvotes: 3
Reputation: 19107
Another apply
method would be sapply()
.
sapply(1:length(a), function(x) L[[x]][a[x]])
[1] 12 17
Upvotes: 3
Reputation: 173978
You could use Map
or mapply
. Since mapply
can automatically simplify to a vector, we can could use that here to get b
in one go:
b <- mapply(function(list_members, indices) list_members[indices],
list_members = L, indices = a, SIMPLIFY = TRUE)
b
#> [1] 12 17
Upvotes: 2