R - joining more than 2^31 rows with data.table

Question

I have an igraph network graph with 103,887 nodes and 4,795,466 ties.

This can be structured as an edgelist in a data.table with almost 9 million rows.

I can find the common neighbors in this network, following @chinsoon12's answer here. See the example below.

This works beautifully for smaller networks, but runs into problems in my use-case because the merge results in more than 2^31 rows.

Questions:

Are there efficient alternatives on how to deal with this?
Can I split the data and do the computation in steps? The results will be used to query about common neighbors.

Example - modified from @chinsoon12's answer:

library(data.table)
library(igraph)

set.seed(1234)
g <- random.graph.game(10, p=0.10)

adjSM <- as(get.adjacency(g), "dgTMatrix")
adjDT <- data.table(V1=adjSM@i+1, V2=adjSM@j+1)

res <- adjDT[adjDT, nomatch=0, on="V2", allow.cartesian=TRUE
][V1 < i.V1, .(Neighbours=paste(V2, collapse=",")),
  by=c("V1","i.V1")][order(V1)]

res

   V1 i.V1 Neighbours
1:  4    5          8
2:  4   10          8
3:  5   10          8

ThomasIsCoding · Accepted Answer

Update

If you just want to query the common neighbors, I don't suggest you build up a huge look-up table. Instead, you can use the following code to get the result for your query:

find_common_neighbors <- function(g, Vs) {
  which(colSums(distances(g, Vs) == 1) == length(Vs))
}

such that

> find_common_neighbors(g, c(4, 8))
integer(0)

> find_common_neighbors(g, c(4, 5))
[1] 8

If you need a look-up table, an alternative is to use Neighbours as the key to search its associated node, e.g.,

res <- transform(
  data.frame(Neighbours = which(degree(g) >= 2)),
  Nodes = sapply(
    Neighbours,
    function(x) toString(neighbors(g, x))
  )
)

Previous Answer

I think you can use ego over g directly to generate res, e.g.,

setNames(
  data.frame(
    t(do.call(
      cbind,
      lapply(
        Filter(function(x) length(x) > 2, ego(g, 1)),
        function(x) {
          rbind(combn(x[-1], 2), x[1])
        }
      )
    ))
  ),
  c("V1", "V2", "Neighbours")
)

which gives

  V1 V2 Neighbours
1  4  5          8
2  4 10          8
3  5 10          8

R - joining more than 2^31 rows with data.table

Answers (2)

Update

Previous Answer

Related Questions