How can I create Frequency Matrix using all columns

Question

Let's say that I have a dataset that contains 4 binary columns for 2 rows.

It looks like this:

    c1 c2 c3 c4 c5
r1  0   1  0  1 0
r2  1   1  1  1 0

I want to create a matrix that gives the number of occurrences of a column, given that it also occurred in another column. Kinda like a confusion matrix

My desired output is:

   c1 c2  c3  c4 c5
c1  -  1   1   1  0
c2  1  -   1   2  0
c3  1  1   -   1  0
c4  1  2   1   -  0

I have used pandas crosstab but it only gives the desired output when using 2 columns. I want to use all of the columns

piRSquared · Accepted Answer

`dot`

df.T.dot(df)
# same as
# df.T @ df

    c1  c2  c3  c4  c5
c1   1   1   1   1   0
c2   1   2   1   2   0
c3   1   1   1   1   0
c4   1   2   1   2   0
c5   0   0   0   0   0

You can use np.fill_diagonal to make the diagonal zero

d = df.T.dot(df)
np.fill_diagonal(d.to_numpy(), 0)
d

    c1  c2  c3  c4  c5
c1   0   1   1   1   0
c2   1   0   1   2   0
c3   1   1   0   1   0
c4   1   2   1   0   0
c5   0   0   0   0   0

And as long as we're using Numpy, you could go all the way...

a = df.to_numpy()
b = a.T @ a
np.fill_diagonal(b, 0)

pd.DataFrame(b, df.columns, df.columns)

    c1  c2  c3  c4  c5
c1   0   1   1   1   0
c2   1   0   1   2   0
c3   1   1   0   1   0
c4   1   2   1   0   0
c5   0   0   0   0   0

How can I create Frequency Matrix using all columns

Answers (2)

`dot`

Related Questions