Reputation: 1034
I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.
E.g I have got the following input
1: Andi
2: Andi, Cindy
3: Thomas, Cindy
4: Cindy, Thomas
And I would like to have the following output:
Hence, the combination of Andi and Thomas does not appear in the data, but Cindy and Thomas appear twice.
Andi Thomas Cindy
Andi 1 0 1
Thomas 0 1 2
Cindy 1 2 1
Has somebody any idea how I could handle this? That would be really great!
Many thanks and regards,
Andi
Upvotes: 4
Views: 331
Reputation:
You can generate the dummy columns first:
df['A'].str.get_dummies(', ')
Out:
Andi Cindy Thomas
0 1 0 0
1 1 1 0
2 0 1 1
3 0 1 1
And use that in the dot product:
tab = df['A'].str.get_dummies(', ')
tab.T.dot(tab)
Out:
Andi Cindy Thomas
Andi 2 1 0
Cindy 1 3 2
Thomas 0 2 2
Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal
from numpy.
co_occurrence = tab.T.dot(tab)
np.fill_diagonal(co_occurrence.values, 1)
co_occurrence
Out:
Andi Cindy Thomas
Andi 1 1 0
Cindy 1 1 2
Thomas 0 2 1
Upvotes: 9