Reputation: 173
Suppose I have the following named numeric vector:
a <- 1:8
names(a) <- rep(c('I', 'II'), each = 4)
How can I convert this vector to a list of length 2 (shown below)?
a.list
# $I
# [1] 1 2 3 4
# $II
# [1] 5 6 7 8
Note that as.list(a)
is not what I'm looking for.
My very unsatisfying (and slow for large vectors) solution is:
names.uniq <- unique(names(a))
a.list <- setNames(vector('list', length(names.uniq)), names.uniq)
for(i in 1:length(names.uniq)) {
names.i <- names.uniq[i]
a.i <- a[names(a)==names.i]
a.list[[names.i]] <- unname(a.i)
}
Thank you in advance for your help, Devin
Upvotes: 16
Views: 15225
Reputation: 21
A quick solution is using lapply(), as elements of the list created by it will take the name from the vector/list it applied the function to. So in this case:
> a <- 1:8
> names(a) <- rep(c('I', 'II'), each = 4)
> a %>% lapply(function(x) x)
$I
[1] 1
$I
[1] 2
$I
[1] 3
$I
[1] 4
$II
[1] 5
$II
[1] 6
$II
[1] 7
$II
[1] 8
Upvotes: 0
Reputation: 10010
To handle also unnamed vectors, use then:
vec_to_list <- function(vec) {
if (is.null(names(vec))) names(vec) <- 1:length(vec)
split(unname(vec), names(vec))
}
Upvotes: 0
Reputation: 193677
I'd suggest looking at packages that excel at working with aggregating large amounts of data, like the data.table
package. With data.table
, you could do:
a <- 1:5e7
names(a) <- c(rep('I',1e7), rep('II',1e7), rep('III',1e7),
rep('IV',1e7), rep('V',1e7))
library(data.table)
temp <- data.table(names(a), a)[, list(V2 = list(a)), V1]
a.list <- setNames(temp[["V2"]], temp[["V1"]])
Here are some functions to test the various options out with:
myFun <- function(invec) {
x <- data.table(names(invec), invec)[, list(V2 = list(invec)), V1]
setNames(x[["V2"]], x[["V1"]])
}
rui1 <- function(invec) {
a.list <- split(invec, names(invec))
lapply(a.list, unname)
}
rui2 <- function(invec) {
split(unname(invec), names(invec))
}
op <- function(invec) {
names.uniq <- unique(names(invec))
a.list <- setNames(vector('list', length(names.uniq)), names.uniq)
for(i in 1:length(names.uniq)) {
names.i <- names.uniq[i]
a.i <- a[names(invec) == names.i]
a.list[[names.i]] <- unname(a.i)
}
a.list
}
And the results of microbenchmark on 10 replications:
library(microbenchmark)
microbenchmark(myFun(a), rui1(a), rui2(a), op(a), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# myFun(a) 698.1553 768.6802 932.6525 934.6666 1056.558 1168.889 10
# rui1(a) 2967.4927 3097.6168 3199.9378 3185.1826 3319.453 3413.185 10
# rui2(a) 2152.0307 2285.4515 2372.9896 2362.7783 2426.821 2643.033 10
# op(a) 2672.4703 2872.5585 2896.7779 2901.7979 2971.782 3039.663 10
Also, note that in testing the different solutions, you might want to consider other scenarios, for instance, cases where you expect to have lots of different names. In that case, your for
loop slows down significantly. Try, for example, the above functions with the following data:
set.seed(1)
b <- sample(100, 5e7, TRUE)
names(b) <- sample(c(letters, LETTERS, 1:100), 5e7, TRUE)
Upvotes: 5
Reputation: 173
Testing Rui Barradas' solution vs my original solution on a larger vector
a <- 1:5e7
names(a) <- c(rep('I',1e7), rep('II',1e7), rep('III',1e7), rep('IV',1e7), rep('V',1e7))
Rui's
st1 <- Sys.time()
a.list <- split(a, names(a))
a.list <- lapply(a.list, unname)
Sys.time() - st1
Time difference of 2.560906 secs
Mine
st1 <- Sys.time()
names.uniq <- unique(names(a))
a.list <- setNames(vector('list', length(names.uniq)), names.uniq)
for(i in 1:length(names.uniq)) {
names.i <- names.uniq[i]
a.i <- a[names(a)==names.i]
a.list[[names.i]] <- unname(a.i)
}
Sys.time() - st1
Time difference of 2.712066 secs
thelatemail's
st1 <- Sys.time()
a.list <- split(unname(a),names(a))
Sys.time() - st1
Time difference of 1.62851 secs
Upvotes: 1
Reputation: 76651
Like I said in the comment, you can use split
to create a list.
a.list <- split(a, names(a))
a.list <- lapply(a.list, unname)
A one-liner would be
a.list <- lapply(split(a, names(a)), unname)
#$I
#[1] 1 2 3 4
#
#$II
#[1] 5 6 7 8
EDIT.
Then, thelatemail posted a simplification of this in his comment. I've timed it using Devin King's way and it's not only simpler it's also 25% faster.
a.list <- split(unname(a),names(a))
Upvotes: 28