Reputation: 179
I have the following data frame in R that has overlapping data in the two columns a_sno and b_sno
a_sno<- c(4,5,5,6,6,7,9,9,10,10,10,11,13,13,13,14,14,15,21,21,21,22,23,23,24,25,183,184,185,185,200)
b_sno<-c(5,4,6,5,7,6,10,13,9,13,14,15,9,10,14,10,13,11,22,23,24,21,21,25,21,23,185,185,183,184,200)
df = data.frame(a_sno, b_sno)
If you take a close look at the data you can see that the 4,5,6&7 intersect/ overlap and I need to put them into a group called 1. Like wise 9,10,13,14 into group 2, 11 and 15 into group 3 etc.... and 200 is not intersecting with any other row but still need to be assigned its own group.
The resulting output should look like this:
---------
group|sno
---------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
4 | 21
4 | 22
4 | 23
4 | 24
4 | 25
5 | 183
5 | 184
5 | 185
6 | 200
Any help to get this done is much appreciated. Thanks
Upvotes: 3
Views: 518
Reputation: 21443
Probably not the most efficient solution but you could use graphs to do this:
#sort the data by row and remove duplicates
df = unique(t(apply(df,1,sort)))
#load the library
library(igraph)
#make a graph with your data
graph <-graph.data.frame(df)
#decompose it into components
components <- decompose.graph(graph)
#get the vertices of the subgraphs
result<-lapply(seq_along(components),function(i){
vertex<-as.numeric(V(components[[i]])$name)
cbind(rep(i,length(vertex)),vertex)
})
#make the final dataframe
output<-as.data.frame(do.call(rbind,result))
colnames(output)<-c("group","sno")
output
Upvotes: 3