Malta
Malta

Reputation: 1983

Create a group index for values connected directly and indirectly

I would like to generate indices to group observations based on two columns. But I want groups to be made of observation that share, at least one observation in commons.

In the data below, I want to check if values in 'G1' and 'G2' are connected directly (appear on the same row), or indirectly via other intermediate values. The desired grouping variable is shown in 'g'.

For example, A is directly linked to Z (row 1) and X (row 2). A is indirectly linked to 'B' via X (A -> X -> B), and further linked to Y via X and B (A -> X -> B -> Y).

dt <- data.frame(id = 1:10,
                 G1 = c("A","A","B","B","C","C","C","D","E","F"),
                 G2 = c("Z","X","X","Y","W","V","U","s","T","T"),
                 g = c(1,1,1,1,2,2,2,3,4,4))

dt
#    id G1 G2 g
# 1   1  A  Z 1
# 2   2  A  X 1
# 3   3  B  X 1
# 4   4  B  Y 1
# 5   5  C  W 2
# 6   6  C  V 2
# 7   7  C  U 2
# 8   8  D  s 3
# 9   9  E  T 4
# 10 10  F  T 4

I tried with group_indices from dplyr, but haven't managed it.

Upvotes: 17

Views: 1032

Answers (1)

zx8754
zx8754

Reputation: 56189

Using igraph get membership, then map on names:

library(igraph)

# convert to graph, and get clusters membership ids
g <- graph_from_data_frame(df1[, c(2, 3, 1)])
myGroups <- components(g)$membership

myGroups 
# A B C D E F Z X Y W V U s T 
# 1 1 2 3 4 4 1 1 1 2 2 2 3 4 

# then map on names
df1$group <- myGroups[df1$G1]


df1
#    id G1 G2 group
# 1   1  A  Z     1
# 2   2  A  X     1
# 3   3  B  X     1
# 4   4  B  Y     1
# 5   5  C  W     2
# 6   6  C  V     2
# 7   7  C  U     2
# 8   8  D  s     3
# 9   9  E  T     4
# 10 10  F  T     4

Upvotes: 20

Related Questions