patpat
patpat

Reputation: 207

Fastest way to union all

I am searching for the fastest way to union all 100000 list into a dataframe. The union all is not a do.call(rbind) problem because i want to put in one column the output and add the minimum of each list in a group( to better understand the output see my code below).

I have tried two different stuff that works but are pretty slow, so i am searching for something using data.table or dplyr or anything that will make it better .

Example to reproduce what i want :

a <- c(1:3) 
b <-  c(12:20)
relations <- list(a,b)

output with two different solution that i tried.

1 - solution basically concatenate dataframes with rbind looping on the elements of the list :

full_group <- NULL
    for(i in 1:length(relations))
  {
    full_group = rbind( full_group,
                data.frame( id = relations[[i]] , 
                group = min( relations[[i]])) )       
                print(i)        
}

2 solution : concatenate vectors and then create a a dataframe out of the results:

full_group <- NULL
groups <- NULL
id <- NULL
    for(i in 1:length(relations))
  {

id <- c(id , relations[[i]] ) 
groups <- c( groups , rep( min(relations[[i]]) , length(relations[[i]]) ) )
                print(i)        
}

 full_group = data.frame( id = id , 
                groups = groups ) 

Upvotes: 0

Views: 816

Answers (1)

A. Webb
A. Webb

Reputation: 26446

Judging by your second solution output, you want what stack does to lists

stack(setNames(relations,sapply(relations,min)))
   values ind
1       1   1
2       2   1
3       3   1
4      12  12
5      13  12
6      14  12
7      15  12
8      16  12
9      17  12
10     18  12
11     19  12
12     20  12

The call the setNames here sets the names for the groups, here the minimum element of each list. The same code works with melt from reshape2 in place of stack, which as @akrun points out may be faster.

Stack and melt, however, will store the group as a factor and character, respectively. If a numeric is desired (probably, here), use a slight modification of its underlying code

stack2 <- function(x,i) data.frame(values=unlist(x), ind=rep.int(i, lapply(x, length)))

stack2(relations,sapply(relations,min))

This is as @alexis_laz was suggesting in the comments.

Upvotes: 4

Related Questions