Reputation: 744
I have a dataframe of the form:
index Name_A Name_B
0 Adam Ben
1 Chris David
2 Adam Chris
3 Ben Chris
And I'd like to obtain the adjacency matrix for Name_A
and Name_B
, ie:
Adam Ben Chris David
Adam 0 1 1 0
Ben 0 0 1 0
Chris 0 0 0 1
David 0 0 0 0
What is the most pythonic/scaleable way of tackling this?
EDIT: Also, I know that if the row Adam, Ben
is in the dataset, then at some other point, Ben, Adam
will also be in the dataset.
Upvotes: 24
Views: 22056
Reputation: 862751
You can use crosstab
and then reindex
by union
of column and index values:
df = pd.crosstab(df.Name_A, df.Name_B)
print (df)
Name_B Ben Chris David
Name_A
Adam 1 1 0
Ben 0 1 0
Chris 0 0 1
df = pd.crosstab(df.Name_A, df.Name_B)
idx = df.columns.union(df.index)
df = df.reindex(index = idx, columns=idx, fill_value=0)
print (df)
Adam Ben Chris David
Adam 0 1 1 0
Ben 0 0 1 0
Chris 0 0 0 1
David 0 0 0 0
Upvotes: 41