Zhimeng Xu
Zhimeng Xu

Reputation: 88

identify unique id from two variables

Suppose I have a dataframe with two columns, P1 and P2. What I want to get is a new column called ID, the same values of P1 get the same ID, and the same values of P2 get the same ID.

P1  P2
a   1
a   1
a   2
b   2
c   3
c   4

So, I want to get the ID column as below:

P1  P2 ID
a   1  1
a   1  1
a   2  1
b   2  1
c   3  2
c   4  2

How can I do this in R?

Upvotes: 1

Views: 152

Answers (1)

G5W
G5W

Reputation: 37661

One way to get this is to treat your data as a graph. The IDs are the connected components of the graph.

IDs = read.table(text="P1  P2
a   1
a   1
a   2
b   2
c   3
c   4",
header=TRUE, stringsAsFactors=FALSE)

library(igraph)
G = graph_from_edgelist(as.matrix(IDs), directed = FALSE)
IDs$ID = components(G)$membership[IDs$P1]
IDs
  P1 P2 ID
1  a  1  1
2  a  1  1
3  a  2  1
4  b  2  1
5  c  3  2
6  c  4  2

To help visualize this,

RES = bipartite_mapping(G)
V(G)$type = RES$type
LO = layout_as_bipartite(G)
plot(G, layout=LO)

Graph View

Upvotes: 4

Related Questions