Peter
Peter

Reputation: 2400

Combine connected list elements to form distinct list elements

I need to combine interconnected list elements to form distinct elements in base R with no additional packages required (while removing NA and zero-length elements).

Edit: I look for flexibility of data types (character, numeric etc), lists below augmented.

The lists:

list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
list(c("a"),c("a","b"),c("a","b"),c("b","c","d"),c("h","i","j"),c("j","k"),c(NA))

...should become:

list(c(1,2,3,4),c(8,9,10,11))
list(c("a","b","c","d"),c("h","i","j","k"))

This is my solution so far which seem to work well:

ml <- list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
# Remove duplicates
ml <- ml[!duplicated(ml)]
# Combine connected list elements
for (l1 in seq(1,length(ml))){
  for (l2 in seq(1,length(ml))){
    if (l1 != l2){
      if(any(ml[[l2]] %in% ml[[l1]])){
        ml[[l1]] <- sort(unique(c(ml[[l1]],ml[[l2]])))
        ml[[l2]] <- NA
      }
    }
  }
}
# Clean up list (remove NA, and zero-length elements)
ml <- Filter(Negate(anyNA), ml)
ml <- ml[lapply(ml, length)>0]

Is there any way to achieve the required result with fewer lines of code?

Upvotes: 3

Views: 139

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 102519

If you are interested in working with base R only but with shorter code, you can try

  • Option 1: Use tcrossprod + %*%
l <- Filter(Negate(anyNA), lst)
M <- tcrossprod(table(stack(setNames(l, paste0("x", seq_along(l))))))
p <- M > 0
repeat {
    q <- (p %*% M) > 0
    if (identical(p, q)) break
    p <- q
}
k <- order(p[, 1])
out <- apply(unique(p[k, k]), 1, \(x) as.integer(names(which(x))))

and you will see

> out
$`2`
[1] 2

$`4`
[1] 4

$`1`
 [1]  1  5  7  9 10 11 12 14 17 18 20
  • Option 2: Use repeat + Reduce
out <- list()
l <- Filter(negate(anyNA), lst)
repeat {
    if (!length(l)) break
    v <- unique(unlist(l))
    sz <- 0
    repeat{
        p <- Reduce(\(x, y) c(x, if (any(y %in% x)) setdiff(y, x)), l)
        l <- lapply(l, \(x) if (all(x %in% p)) p else x)
        if (length(p) == sz) break
        sz <- length(p)
    }
    out <- c(out, list(p))
    l <- Filter(\(x) !any(x %in% p), l)
}

gives

> out
[[1]]
 [1] 11 14 18  1  5  9 20 17 10  7 12

[[2]]
[1] 4

[[3]]
[1] 2

data


Given a "tough" list for example

set.seed(0)
lst <- sample(c(rep(NA, 3), replicate(7, list(sample(20, sample(5, 1))))))

> lst
[[1]]
[1] NA

[[2]]
[1] 11 14 18  1  5

[[3]]
[1] NA

[[4]]
[1]  5  9 14 20 17

[[5]]
[1] 10 12

[[6]]
[1] 4

[[7]]
[1] 2

[[8]]
[1] 10 14 20  7  9

[[9]]
[1] 20

[[10]]
[1] NA

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 270045

There are 6 basic vector data types in R: logical, numeric, integer, complex, character and raw (see typeof) and this should work with all except possibly raw. It creates a 2 column data frame edges each of whose rows represents an edge and from that an igraph object g. At the end we show the graph.

library(igraph)

L <- list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))

L2 <- Map(\(x) unique(na.omit(x)), L)
edges <- L2 |>
  Filter(f = \(x) length(x) > 1) |>
  Map(f = \(x) data.frame(from = head(x, -1), to = tail(x, -1))) |>
  do.call(what = "rbind") |>
  unique()

g <- graph_from_data_frame(edges, vertices = unique(unlist(L2)), directed = FALSE)

cl <- components(g)$membership
s <- split( as(names(cl), typeof(unlist(L))), cl)

str(s)
## List of 2
##  $ 1: num [1:4] 1 2 3 4
##  $ 2: num [1:4] 8 9 10 11

Image of graph follows. Note that it has an edge for each element to the next within each component of L with duplicates removed.

set.seed(123)
plot(g)

screenshot

Upvotes: 5

ThomasIsCoding
ThomasIsCoding

Reputation: 102519

There are already a bunch of excellent solutions. Below is just another option, using graph_from_biadjacency_matrix and bipartite_projection

library(igraph)

lst %>%
    Filter(f = Negate(anyNA)) %>%
    setNames(paste0("x", seq_along(.))) %>%
    stack() %>%
    table() %>%
    graph_from_biadjacency_matrix() %>%
    bipartite_projection(which = "false") %>%
    decompose() %>%
    map(~ as.integer(names(V(.x))))

and you will obtain

[[1]]
[1] 1 2 3 4

[[2]]
[1]  8  9 10 11

data

lst <- list(c(1), c(1, 2), c(1, 2), c(2, 3, 4), c(8, 9, 10), c(10, 11), NA)

Let's take a look how bipartite graph makes things work. Given a graph g obtained from below

g <- lst %>%
    Filter(f = Negate(anyNA)) %>%
    setNames(paste0("x", seq_along(.))) %>%
    stack() %>%
    table() %>%
    graph_from_biadjacency_matrix()


g %>%
    plot(layout = layout_as_bipartite, vertex.color = factor(V(.)$type))

we see

enter image description here

Then, using bipartite_projection

g %>%
    bipartite_projection(which = "false") %>%
    plot()

we see enter image description here

where which = "false" indicates the vertices as shown in lst, otherwise which = "true" denotes the vertices of names starting with x.

Upvotes: 2

Ma&#235;l
Ma&#235;l

Reputation: 52319

It might be easier to work with data.frames on that problem. Here's a solution using igraph and working on all data types:

library(igraph)

dat <- 
  setNames(l, paste0('node', 1:length(l))) |> stack() |> na.omit()

dat |>
  graph_from_data_frame(dir = FALSE) |>
  components() |>
  getElement('membership') |>
  stack() |>
  subset(!grepl("node", ind)) |>
  aggregate(ind ~ values, FUN = toString)

#  values          ind
#1      1   1, 2, 3, 4
#2      2 8, 9, 10, 11

Upvotes: 5

jay.sf
jay.sf

Reputation: 73562

You could try to unlist and sort (which removes NA), then split along cumulated differences.

> u <- sort(unique(unlist(lst)))
> split(u, cumsum(c(1, diff(u)) > 1)) |> unname()
[[1]]
[1] 1 2 3 4

[[2]]
[1]  8  9 10 11

[[3]]
[1] 20 21 22 23

Data:

Slightly expanded.

> dput(lst)
list(1, c(1, 2), c(1, 2), c(2, 3, 4), c(8, 9, 10), c(10, 11), 
    20:23, NA)

Upvotes: 4

Related Questions