Reputation: 1769
I have a list list
. The first 5 elements of this list are:
[[1]]
[1] "#solarpanels" "#solar"
[[2]]
[1] "#Nuclear" "#Wind" "#solar"
[[3]]
[1] "#solar"
[[4]]
[1] "#steel" "#windenergy" "#solarenergy" "#carbonfootprint"
[[5]]
[1] "#solar" "#wind"
I would like to delete elements like [[3]]
because contains only one element. Moreover, I would like to build a dataframe containing all the possible combinations for each row of the list. For example, dataframe with two columns (e.g. the first named A
, the second B
) such as:
A B
"#solarpanels" "#solar"
"#Nuclear" "#Wind"
"#Nuclear" "#solar"
"#steel" "#windenergy"
"#steel" "#solarenergy"
"#steel" "#carbonfootprint"
"#windenergy" "#carbonfootprint"
"#windenergy" "#solarenergy"
"#solarenergy" "#carbonfootprint"
"#solar" "#wind"
I tried with (just for one element)
for (i in 1:(length(list[[4]])-1)) {
df$from = rep(list[[4]][i],length(list[[4]])-i)
df$to = list[[4]][(i+1):length(list[[4]])]
}
where
df=data.frame(A=character(),
B=character(),
stringsAsFactors=FALSE)
but I obtained
data.frame`(`*tmp*`, A, value = c("#steel", "#steel", :
replacement has 3 rows, data has 0
for i=1
.
Upvotes: 2
Views: 230
Reputation: 34703
Your data first:
l = list(
c("#solarpanels", "#solar"),
c("#Nuclear", "#Wind", "#solar"),
"#solar",
c("#steel", "#windenergy", "#solarenergy", "#carbonfootprint"),
c("#solar", "#wind")
)
Here's a two-liner version:
l = l[lengths(l) > 1L]
data.frame(do.call(rbind, unlist(lapply(l, combn, 2L, simplify = FALSE), recursive = FALSE)))
# X1 X2
# 1 #solarpanels #solar
# 2 #Nuclear #Wind
# 3 #Nuclear #solar
# 4 #Wind #solar
# 5 #steel #windenergy
# 6 #steel #solarenergy
# 7 #steel #carbonfootprint
# 8 #windenergy #solarenergy
# 9 #windenergy #carbonfootprint
# 10 #solarenergy #carbonfootprint
# 11 #solar #wind
More slowly, for clarity:
combn(x, k)
returns every possible (unordered) subset of size k
from x
; what you're after is the pairs from each element of the list. By default, it returns this as a matrix
with p = choose(length(x), k)
columns, but that's not a helpful format for your use case; simplify = FALSE
returns each subset as a new element of a list
instead.
So lapply(l, combn, 2L, simplify = FALSE)
will look something like:
# [[1]]
# [[1]][[1]]
# [1] "#solarpanels" "#solar"
#
#
# [[2]]
# [[2]][[1]]
# [1] "#Nuclear" "#Wind"
#
# [[2]][[2]]
# [1] "#Nuclear" "#solar"
(we have to filter the length-1 elements of l
first, since it's an error to ask for 2
elements from a length-1 object, hence the first line)
The lapply(.)
bit is the crux of your issue; the rest is just kludging the output (which already has all the correct data) into a data.frame
format.
First, the lapply
output is nested -- it's a list
of list
s. It's more uniform to have a list
of length-2 vectors; unlist(., recusive=FALSE)
accomplishes this by un-nesting the first level of lists (with recursive=TRUE
, we'd wind up with a big long vector and lose the paired structure; we could work with this, but I think maybe a bit unnatural).
Next, we turn the list of length-2 vectors into a matrix (with an eye to the end goal -- a 2-column matrix is very easy to convert to a data.frame
); list
->matrix
is done in base
with do.call(rbind, .)
.
Finally we pass this to data.frame
, et voila!
In data.table
, I would do it slightly cleaner and in one command:
setDT(transpose(
unlist(lapply(l[lengths(l) > 1L], combn, 2L, simplify = FALSE), recursive = FALSE)
))[]
Given you likely don't care much about intermediate output, this would also be a good place to use magrittr
:
library(magrittr)
l[lengths(l) > 1L] %>%
lapply(combn, 2L, simplify = FALSE) %>%
unlist(recursive = FALSE) %>%
do.call(rbind, . ) %>%
data.frame
It's more readable, but in this case, it might be nice to see that data.frame
is the end goal up-front, as the intent of the unlist
& do.call
steps might otherwise be obscure.
Upvotes: 8