Reputation: 88
Suppose I have a dataframe with two columns, P1 and P2. What I want to get is a new column called ID, the same values of P1 get the same ID, and the same values of P2 get the same ID.
P1 P2
a 1
a 1
a 2
b 2
c 3
c 4
So, I want to get the ID column as below:
P1 P2 ID
a 1 1
a 1 1
a 2 1
b 2 1
c 3 2
c 4 2
How can I do this in R?
Upvotes: 1
Views: 152
Reputation: 37661
One way to get this is to treat your data as a graph. The IDs are the connected components of the graph.
IDs = read.table(text="P1 P2
a 1
a 1
a 2
b 2
c 3
c 4",
header=TRUE, stringsAsFactors=FALSE)
library(igraph)
G = graph_from_edgelist(as.matrix(IDs), directed = FALSE)
IDs$ID = components(G)$membership[IDs$P1]
IDs
P1 P2 ID
1 a 1 1
2 a 1 1
3 a 2 1
4 b 2 1
5 c 3 2
6 c 4 2
To help visualize this,
RES = bipartite_mapping(G)
V(G)$type = RES$type
LO = layout_as_bipartite(G)
plot(G, layout=LO)
Upvotes: 4