Reputation: 99
I have a list with vectors of different length, for example:
a=c(12345,12367,91670,87276,92865)
b=c(12345,87276,89250)
c=c(12367,91670)
d=c(23753,82575,91475,10957,92865,24311)
mylist=list(a,b,c,d)
mylist
# [[1]]
# [1] 12345 12367 91670 87276 92865
#
# [[2]]
# [1] 12345 87276 89250
#
# [[3]]
# [1] 12367 91670
#
# [[4]]
# [1] 23753 82575 91475 10957 92865 24311
my question is how can I remove the vectors of this list which are a subset of another vector of the same list. i.e. in this case how can I remove the 3rd object of the list which is a subset of the 1st object??
Upvotes: 6
Views: 316
Reputation: 13122
An alternative (of the "list") format for the elements is a table of "elements-by-values":
table(values = unlist(mylist),
elt = rep(seq_along(mylist), lengths(mylist)))
# elt
#values 1 2 3 4
# 10957 0 0 0 1
# 12345 1 1 0 0
# 12367 1 0 1 0
# 23753 0 0 0 1
# 24311 0 0 0 1
# 82575 0 0 0 1
# 87276 1 1 0 0
# 89250 0 1 0 0
# 91475 0 0 0 1
# 91670 1 0 1 0
# 92865 1 0 0 1
This can, very easy, consume much memory so we could chase a sparse alternative:
l = unlist(mylist)
ul = unique(l)
tab = sparseMatrix(x = TRUE,
i = match(l, ul),
j = rep(seq_along(mylist), lengths(mylist)),
dimnames = list(ul, sprintf("elt_%d", seq_along(mylist))))
tab
#11 x 4 sparse Matrix of class "lgCMatrix"
# elt_1 elt_2 elt_3 elt_4
#12345 | | . .
#12367 | . | .
#91670 | . | .
#87276 | | . .
#92865 | . . |
#89250 . | . .
#23753 . . . |
#82575 . . . |
#91475 . . . |
#10957 . . . |
#24311 . . . |
Then, to find which element is a subset of which:
subsets = lengths(mylist) == crossprod(tab)
subsets
#4 x 4 sparse Matrix of class "lgCMatrix"
# elt_1 elt_2 elt_3 elt_4
#elt_1 | . . .
#elt_2 . | . .
#elt_3 | . | .
#elt_4 . . . |
where, here, each element is a subset of ...itself and 3 is a subset of 1. To reach the wanted info we could use:
subset(summary(subsets), i != j)[c("i", "j")]
#
# i j
#2 3 1
Or, to avoid incorporating "diag" indices and subsetting them afterwards, we could manipulate the structure already present:
dp = diff(subsets@p)
j = rep(0:(length(dp) - 1), dp)
wh = subsets@i != j
cbind(subset = subsets@i[wh], of = j[wh]) + 1L
# subset of
#[1,] 3 1
In both the last cases, the unique
of the first column shows which elements are subset of another and can be used to [
"mylist".
Upvotes: 1
Reputation: 18425
This gives a new list without the elements that are subsets of others...
newlist <- mylist[!sapply(seq_along(mylist), function(i) max(sapply(mylist[-i],function(L) all(mylist[[i]] %in% L))))]
Upvotes: 3
Reputation: 38500
Here is another method. It is also not super efficient, but will return the position of nested lists.
# get ordered pairwise combinations of list positions
combos <- combn(1:4, 2)
combos <- cbind(combos, combos[2:1,])
Ordered is important, because the comparison is not symmetric. Now, loop through these combinations and compare sets using intersect
.
combos[1, sapply(seq_len(ncol(combos)),
function(i) setequal(intersect(mylist[[combos[1,i]]], mylist[[combos[2,i]]]),
mylist[[combos[1,i]]]))]
[1] 3
rewriting the last line to use mapply
rather than sapply
may improve readability.
combos[1, mapply(function(x, y) setequal(intersect(mylist[[x]], mylist[[y]]), mylist[[x]]),
combos[1,], combos[2,])]
[1] 3
Upvotes: 1
Reputation: 32548
which(t(sapply(seq_along(mylist), function(i)
sapply(mylist[-i], function(a)
all(unlist(mylist[i]) %in% a)))), arr.ind = TRUE)
# row col
#[1,] 3 1
#Suggests that 3rd item is contained within 1st item
Upvotes: 2
Reputation: 206177
This is may be pretty inefficient, but if your list isn't that big it could work
find_nested <- function(mylist) {
mm <- sapply(mylist, function(x) sapply(mylist, function(y) all(x %in%y)))
diag(mm) <- FALSE
apply(mm,2,any)
}
This can tell you which vectors are subsets of other vectors. It does this by comparing every vector to every other vector.
find_nested(mylist)
# [1] FALSE FALSE TRUE FALSE
So we can see the third item is contained in another list.
Upvotes: 3