Stephen Turner
Stephen Turner

Reputation: 2604

Simple network / cluster membership from two-column data frame

I have a two column data frame that looks like this:


# What I have
data.frame(id1=c("a", "a", "a", "j", "x", "x"), 
           id2=c("b", "c", "d", "k", "y", "z"))
#>   id1 id2
#> 1   a   b
#> 2   a   c
#> 3   a   d
#> 4   j   k
#> 5   x   y
#> 6   x   z

Two columns, showing two different IDs. In this case, a, b, c, and d are all in the same "family" or "group", as are j and k, and in a third, x, y, and z.

What I want is a data frame that creates an arbitrary group ID based on the columns above. In this example, a-d are put into group 1, j-k in group 2, x-z in group 3.

I'd like to also show the number of individuals in that group, but once given the group ID I can easily add this number_in_group with dplyr::add_count(group).


# What I want
data.frame(id=c("a", "b", "c", "d", "j", "k", "x", "y", "z"), 
           group=c(1,1,1,1,2,2,3,3,3), 
           number_in_group=c(4,4,4,4,2,2,3,3,3))
#>   id group number_in_group
#> 1  a     1               4
#> 2  b     1               4
#> 3  c     1               4
#> 4  d     1               4
#> 5  j     2               2
#> 6  k     2               2
#> 7  x     3               3
#> 8  y     3               3
#> 9  z     3               3

Upvotes: 2

Views: 595

Answers (1)

akuiper
akuiper

Reputation: 214967

You can extract the information from the clusters' membership and cluster size (csize):

library(dplyr); library(igraph)
clusters <- clusters(graph.data.frame(df))

with(clusters, 
    data.frame(
        id = names(membership), 
        group = membership, 
        number_in_group = csize[membership]
    )
) %>% arrange(group)

#  id group number_in_group
#1  a     1               4
#2  b     1               4
#3  c     1               4
#4  d     1               4
#5  j     2               2
#6  k     2               2
#7  x     3               3
#8  y     3               3
#9  z     3               3

df <- data.frame(id1=c("a", "a", "a", "j", "x", "x"), 
                 id2=c("b", "c", "d", "k", "y", "z"))

Upvotes: 4

Related Questions