Reputation: 2400
I need to combine interconnected list elements to form distinct elements in base R with no additional packages required (while removing NA and zero-length elements).
Edit: I look for flexibility of data types (character, numeric etc), lists below augmented.
The lists:
list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
list(c("a"),c("a","b"),c("a","b"),c("b","c","d"),c("h","i","j"),c("j","k"),c(NA))
...should become:
list(c(1,2,3,4),c(8,9,10,11))
list(c("a","b","c","d"),c("h","i","j","k"))
This is my solution so far which seem to work well:
ml <- list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
# Remove duplicates
ml <- ml[!duplicated(ml)]
# Combine connected list elements
for (l1 in seq(1,length(ml))){
for (l2 in seq(1,length(ml))){
if (l1 != l2){
if(any(ml[[l2]] %in% ml[[l1]])){
ml[[l1]] <- sort(unique(c(ml[[l1]],ml[[l2]])))
ml[[l2]] <- NA
}
}
}
}
# Clean up list (remove NA, and zero-length elements)
ml <- Filter(Negate(anyNA), ml)
ml <- ml[lapply(ml, length)>0]
Is there any way to achieve the required result with fewer lines of code?
Upvotes: 3
Views: 139
Reputation: 102519
If you are interested in working with base R only but with shorter code, you can try
tcrossprod
+ %*%
l <- Filter(Negate(anyNA), lst)
M <- tcrossprod(table(stack(setNames(l, paste0("x", seq_along(l))))))
p <- M > 0
repeat {
q <- (p %*% M) > 0
if (identical(p, q)) break
p <- q
}
k <- order(p[, 1])
out <- apply(unique(p[k, k]), 1, \(x) as.integer(names(which(x))))
and you will see
> out
$`2`
[1] 2
$`4`
[1] 4
$`1`
[1] 1 5 7 9 10 11 12 14 17 18 20
repeat
+ Reduce
out <- list()
l <- Filter(negate(anyNA), lst)
repeat {
if (!length(l)) break
v <- unique(unlist(l))
sz <- 0
repeat{
p <- Reduce(\(x, y) c(x, if (any(y %in% x)) setdiff(y, x)), l)
l <- lapply(l, \(x) if (all(x %in% p)) p else x)
if (length(p) == sz) break
sz <- length(p)
}
out <- c(out, list(p))
l <- Filter(\(x) !any(x %in% p), l)
}
gives
> out
[[1]]
[1] 11 14 18 1 5 9 20 17 10 7 12
[[2]]
[1] 4
[[3]]
[1] 2
Given a "tough" list for example
set.seed(0)
lst <- sample(c(rep(NA, 3), replicate(7, list(sample(20, sample(5, 1))))))
> lst
[[1]]
[1] NA
[[2]]
[1] 11 14 18 1 5
[[3]]
[1] NA
[[4]]
[1] 5 9 14 20 17
[[5]]
[1] 10 12
[[6]]
[1] 4
[[7]]
[1] 2
[[8]]
[1] 10 14 20 7 9
[[9]]
[1] 20
[[10]]
[1] NA
Upvotes: 2
Reputation: 270045
There are 6 basic vector data types in R: logical, numeric, integer, complex, character and raw (see typeof) and this should work with all except possibly raw. It creates a 2 column data frame edges
each of whose rows represents an edge and from that an igraph object g
. At the end we show the graph.
library(igraph)
L <- list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
L2 <- Map(\(x) unique(na.omit(x)), L)
edges <- L2 |>
Filter(f = \(x) length(x) > 1) |>
Map(f = \(x) data.frame(from = head(x, -1), to = tail(x, -1))) |>
do.call(what = "rbind") |>
unique()
g <- graph_from_data_frame(edges, vertices = unique(unlist(L2)), directed = FALSE)
cl <- components(g)$membership
s <- split( as(names(cl), typeof(unlist(L))), cl)
str(s)
## List of 2
## $ 1: num [1:4] 1 2 3 4
## $ 2: num [1:4] 8 9 10 11
Image of graph follows. Note that it has an edge for each element to the next within each component of L with duplicates removed.
set.seed(123)
plot(g)
Upvotes: 5
Reputation: 102519
There are already a bunch of excellent igraph solutions. Below is just another option, using graph_from_biadjacency_matrix
and bipartite_projection
library(igraph)
lst %>%
Filter(f = Negate(anyNA)) %>%
setNames(paste0("x", seq_along(.))) %>%
stack() %>%
table() %>%
graph_from_biadjacency_matrix() %>%
bipartite_projection(which = "false") %>%
decompose() %>%
map(~ as.integer(names(V(.x))))
and you will obtain
[[1]]
[1] 1 2 3 4
[[2]]
[1] 8 9 10 11
lst <- list(c(1), c(1, 2), c(1, 2), c(2, 3, 4), c(8, 9, 10), c(10, 11), NA)
Let's take a look how bipartite graph makes things work. Given a graph g
obtained from below
g <- lst %>%
Filter(f = Negate(anyNA)) %>%
setNames(paste0("x", seq_along(.))) %>%
stack() %>%
table() %>%
graph_from_biadjacency_matrix()
g %>%
plot(layout = layout_as_bipartite, vertex.color = factor(V(.)$type))
we see
Then, using bipartite_projection
g %>%
bipartite_projection(which = "false") %>%
plot()
where which = "false"
indicates the vertices as shown in lst
, otherwise which = "true"
denotes the vertices of names starting with x
.
Upvotes: 2
Reputation: 52319
It might be easier to work with data.frames on that problem. Here's a solution using igraph
and working on all data types:
library(igraph)
dat <-
setNames(l, paste0('node', 1:length(l))) |> stack() |> na.omit()
dat |>
graph_from_data_frame(dir = FALSE) |>
components() |>
getElement('membership') |>
stack() |>
subset(!grepl("node", ind)) |>
aggregate(ind ~ values, FUN = toString)
# values ind
#1 1 1, 2, 3, 4
#2 2 8, 9, 10, 11
Upvotes: 5
Reputation: 73562
You could try to unlist
and sort
(which removes NA
), then split
along cumulated diff
erences.
> u <- sort(unique(unlist(lst)))
> split(u, cumsum(c(1, diff(u)) > 1)) |> unname()
[[1]]
[1] 1 2 3 4
[[2]]
[1] 8 9 10 11
[[3]]
[1] 20 21 22 23
Data:
Slightly expanded.
> dput(lst)
list(1, c(1, 2), c(1, 2), c(2, 3, 4), c(8, 9, 10), c(10, 11),
20:23, NA)
Upvotes: 4