Reputation: 207
I am searching for the fastest way to union all 100000 list into a dataframe. The union all is not a do.call(rbind) problem because i want to put in one column the output and add the minimum of each list in a group( to better understand the output see my code below).
I have tried two different stuff that works but are pretty slow, so i am searching for something using data.table or dplyr or anything that will make it better .
Example to reproduce what i want :
a <- c(1:3)
b <- c(12:20)
relations <- list(a,b)
output with two different solution that i tried.
1 - solution basically concatenate dataframes with rbind looping on the elements of the list :
full_group <- NULL
for(i in 1:length(relations))
{
full_group = rbind( full_group,
data.frame( id = relations[[i]] ,
group = min( relations[[i]])) )
print(i)
}
2 solution : concatenate vectors and then create a a dataframe out of the results:
full_group <- NULL
groups <- NULL
id <- NULL
for(i in 1:length(relations))
{
id <- c(id , relations[[i]] )
groups <- c( groups , rep( min(relations[[i]]) , length(relations[[i]]) ) )
print(i)
}
full_group = data.frame( id = id ,
groups = groups )
Upvotes: 0
Views: 816
Reputation: 26446
Judging by your second solution output, you want what stack
does to lists
stack(setNames(relations,sapply(relations,min)))
values ind 1 1 1 2 2 1 3 3 1 4 12 12 5 13 12 6 14 12 7 15 12 8 16 12 9 17 12 10 18 12 11 19 12 12 20 12
The call the setNames
here sets the names for the groups, here the minimum element of each list. The same code works with melt
from reshape2
in place of stack
, which as @akrun points out may be faster.
Stack and melt, however, will store the group as a factor and character, respectively. If a numeric is desired (probably, here), use a slight modification of its underlying code
stack2 <- function(x,i) data.frame(values=unlist(x), ind=rep.int(i, lapply(x, length)))
stack2(relations,sapply(relations,min))
This is as @alexis_laz was suggesting in the comments.
Upvotes: 4