Andi Maier
Andi Maier

Reputation: 1034

Pandas Crosstabulation and counting

I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.

E.g I have got the following input

1: Andi
2: Andi, Cindy
3: Thomas, Cindy
4: Cindy, Thomas

And I would like to have the following output:

Hence, the combination of Andi and Thomas does not appear in the data, but Cindy and Thomas appear twice.

          Andi  Thomas  Cindy
    Andi    1     0      1
    Thomas  0     1      2
    Cindy   1     2      1

Has somebody any idea how I could handle this? That would be really great!

Many thanks and regards,

Andi

Upvotes: 4

Views: 331

Answers (1)

user2285236
user2285236

Reputation:

You can generate the dummy columns first:

df['A'].str.get_dummies(', ')
Out: 
   Andi  Cindy  Thomas
0     1      0       0
1     1      1       0
2     0      1       1
3     0      1       1

And use that in the dot product:

tab = df['A'].str.get_dummies(', ')

tab.T.dot(tab)
Out: 
        Andi  Cindy  Thomas
Andi       2      1       0
Cindy      1      3       2
Thomas     0      2       2

Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal from numpy.

co_occurrence = tab.T.dot(tab)    
np.fill_diagonal(co_occurrence.values, 1)    
co_occurrence
Out: 
        Andi  Cindy  Thomas
Andi       1      1       0
Cindy      1      1       2
Thomas     0      2       1

Upvotes: 9

Related Questions