Reputation: 2604
I have a two column data frame that looks like this:
# What I have
data.frame(id1=c("a", "a", "a", "j", "x", "x"),
id2=c("b", "c", "d", "k", "y", "z"))
#> id1 id2
#> 1 a b
#> 2 a c
#> 3 a d
#> 4 j k
#> 5 x y
#> 6 x z
Two columns, showing two different IDs. In this case, a, b, c, and d are all in the same "family" or "group", as are j and k, and in a third, x, y, and z.
What I want is a data frame that creates an arbitrary group ID based on the columns above. In this example, a-d are put into group 1, j-k in group 2, x-z in group 3.
I'd like to also show the number of individuals in that group, but once given the group ID I can easily add this number_in_group
with dplyr::add_count(group)
.
# What I want
data.frame(id=c("a", "b", "c", "d", "j", "k", "x", "y", "z"),
group=c(1,1,1,1,2,2,3,3,3),
number_in_group=c(4,4,4,4,2,2,3,3,3))
#> id group number_in_group
#> 1 a 1 4
#> 2 b 1 4
#> 3 c 1 4
#> 4 d 1 4
#> 5 j 2 2
#> 6 k 2 2
#> 7 x 3 3
#> 8 y 3 3
#> 9 z 3 3
Upvotes: 2
Views: 595
Reputation: 214967
You can extract the information from the clusters' membership and cluster size (csize):
library(dplyr); library(igraph)
clusters <- clusters(graph.data.frame(df))
with(clusters,
data.frame(
id = names(membership),
group = membership,
number_in_group = csize[membership]
)
) %>% arrange(group)
# id group number_in_group
#1 a 1 4
#2 b 1 4
#3 c 1 4
#4 d 1 4
#5 j 2 2
#6 k 2 2
#7 x 3 3
#8 y 3 3
#9 z 3 3
df <- data.frame(id1=c("a", "a", "a", "j", "x", "x"),
id2=c("b", "c", "d", "k", "y", "z"))
Upvotes: 4