Mark
Mark

Reputation: 1769

Build a dataframe from pairwise combinations of list elements

I have a list list. The first 5 elements of this list are:

[[1]]
[1] "#solarpanels" "#solar"      

[[2]]
[1] "#Nuclear" "#Wind"    "#solar"  

[[3]]
[1] "#solar"

[[4]]
[1] "#steel"           "#windenergy"      "#solarenergy"     "#carbonfootprint"

[[5]]
[1] "#solar" "#wind"

I would like to delete elements like [[3]] because contains only one element. Moreover, I would like to build a dataframe containing all the possible combinations for each row of the list. For example, dataframe with two columns (e.g. the first named A, the second B) such as:

A                  B
"#solarpanels"     "#solar"
"#Nuclear"         "#Wind"  
"#Nuclear"         "#solar"
"#steel"           "#windenergy"
"#steel"           "#solarenergy"
"#steel"           "#carbonfootprint"
"#windenergy"      "#carbonfootprint"
"#windenergy"      "#solarenergy"
"#solarenergy"     "#carbonfootprint"
"#solar"           "#wind"

I tried with (just for one element)

for (i in 1:(length(list[[4]])-1)) {
  df$from = rep(list[[4]][i],length(list[[4]])-i)
  df$to = list[[4]][(i+1):length(list[[4]])]
}

where

df=data.frame(A=character(), 
                    B=character(),
                    stringsAsFactors=FALSE) 

but I obtained

data.frame`(`*tmp*`, A, value = c("#steel", "#steel",  : 
 replacement has 3 rows, data has 0

for i=1.

Upvotes: 2

Views: 230

Answers (1)

MichaelChirico
MichaelChirico

Reputation: 34703

Your data first:

l = list(
  c("#solarpanels", "#solar"),
  c("#Nuclear", "#Wind", "#solar"),
  "#solar",
  c("#steel", "#windenergy", "#solarenergy", "#carbonfootprint"),
  c("#solar", "#wind")
)

Here's a two-liner version:

l = l[lengths(l) > 1L]
data.frame(do.call(rbind, unlist(lapply(l, combn, 2L, simplify = FALSE), recursive = FALSE)))
#              X1               X2
# 1  #solarpanels           #solar
# 2      #Nuclear            #Wind
# 3      #Nuclear           #solar
# 4         #Wind           #solar
# 5        #steel      #windenergy
# 6        #steel     #solarenergy
# 7        #steel #carbonfootprint
# 8   #windenergy     #solarenergy
# 9   #windenergy #carbonfootprint
# 10 #solarenergy #carbonfootprint
# 11       #solar            #wind

More slowly, for clarity:

combn(x, k) returns every possible (unordered) subset of size k from x; what you're after is the pairs from each element of the list. By default, it returns this as a matrix with p = choose(length(x), k) columns, but that's not a helpful format for your use case; simplify = FALSE returns each subset as a new element of a list instead.

So lapply(l, combn, 2L, simplify = FALSE) will look something like:

# [[1]]
# [[1]][[1]]
# [1] "#solarpanels" "#solar"      
# 
# 
# [[2]]
# [[2]][[1]]
# [1] "#Nuclear" "#Wind"   
# 
# [[2]][[2]]
# [1] "#Nuclear" "#solar"  

(we have to filter the length-1 elements of l first, since it's an error to ask for 2 elements from a length-1 object, hence the first line)

The lapply(.) bit is the crux of your issue; the rest is just kludging the output (which already has all the correct data) into a data.frame format.

First, the lapply output is nested -- it's a list of lists. It's more uniform to have a list of length-2 vectors; unlist(., recusive=FALSE) accomplishes this by un-nesting the first level of lists (with recursive=TRUE, we'd wind up with a big long vector and lose the paired structure; we could work with this, but I think maybe a bit unnatural).

Next, we turn the list of length-2 vectors into a matrix (with an eye to the end goal -- a 2-column matrix is very easy to convert to a data.frame); list->matrix is done in base with do.call(rbind, .).

Finally we pass this to data.frame, et voila!

In data.table, I would do it slightly cleaner and in one command:

setDT(transpose(
  unlist(lapply(l[lengths(l) > 1L], combn, 2L, simplify = FALSE), recursive = FALSE)
))[]

Given you likely don't care much about intermediate output, this would also be a good place to use magrittr:

library(magrittr)
l[lengths(l) > 1L] %>%
  lapply(combn, 2L, simplify = FALSE) %>% 
  unlist(recursive = FALSE) %>%
  do.call(rbind, . ) %>%
  data.frame

It's more readable, but in this case, it might be nice to see that data.frame is the end goal up-front, as the intent of the unlist & do.call steps might otherwise be obscure.

Upvotes: 8

Related Questions