tomka
tomka

Reputation: 2638

How to look up specific indices of vectors in a list of vectors, where the indices are given in a vector? (without a for loop)

I would like to find an efficient operation to do the following look up in a list:

L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]

I think for loops are inefficient and I imagine this can be done faster using, for example, sapply. My main goal is to do this efficiently when L is long.

Upvotes: 1

Views: 138

Answers (4)

jblood94
jblood94

Reputation: 16981

UPDATE:

Your aversion to a for loop may be unfounded. I've found that it can be very machine dependent. On my current machine, with b properly initialized, a base R for loop is slower only than an Rcpp solution, and that just barely. See the updated benchmark below. The loop1 solution is properly initialized. However, I've tried this on other machines, and on some the for loops are indeed slower than the apply solutions.


A base R vectorized solution using unlist, cumsum, and lengths:

b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]

Benchmarking (tossing in an Rcpp solution)*

library(purrr)

L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))

Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
    
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
                           vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
                           purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
                           unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
                           rcpp = ListIndex(L, a),
                           loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           check = "identical")

#> Unit: milliseconds
#>    expr      min       lq      mean    median       uq      max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465   100
#> vapply  97.8447 107.33390 116.41775 112.33445 119.01680 189.9191   100
#>   purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446   100
#> unlist  29.4186  29.97935  32.05529  30.86130  33.02160  44.6751   100
#>   rcpp  22.3468  22.78460  25.47667  23.48495  26.63935  37.2362   100
#>  loop1  25.5240  27.34865  28.94650  28.02920  29.32110  42.9779   100
#>  loop2  41.4726  46.04130  52.58843  51.00240  56.54375  88.3444   100

*I couldn't get akrun's dplyr solution to work with the larger vector.

Upvotes: 2

akrun
akrun

Reputation: 887291

We could use

library(dplyr)
stack(setNames(L, a)) %>%
   group_by(ind) %>% 
   summarise(out = values[[as.numeric(as.character(first(ind)))]]) %>%
   pull(out)
[1] 12 17

Or in base R using vapply which would be faster

vapply(seq_along(L), \(i) L[[i]][a[i]], numeric(1))
[1] 12 17

or use imap as a compact option

library(purrr)
imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])
 3  7 
12 17 

Upvotes: 3

benson23
benson23

Reputation: 19107

Another apply method would be sapply().

sapply(1:length(a), function(x) L[[x]][a[x]])
[1] 12 17

Upvotes: 3

Allan Cameron
Allan Cameron

Reputation: 173978

You could use Map or mapply. Since mapply can automatically simplify to a vector, we can could use that here to get b in one go:

b <- mapply(function(list_members, indices) list_members[indices],
       list_members = L, indices = a, SIMPLIFY = TRUE)

b
#> [1] 12 17

Upvotes: 2

Related Questions