Numpy, Frequency of variable pairs in observations

Question

I have an MxN 2d numpy array "A", where M is the number of observations, and N is the number of variables being examined.

Each entry in A can either be 1 or 0, 1 denotes the presence of the variable in that observation, and 0 denotes no presence of that variable.

I would like to create an NxN matrix of mutual frequencies, "B", in terms of the presence of variables. An entry of indices [i,j] in "B" would refer to the number of rows in "A" where both variable i and variable j were present together.

For an example:

Matrix A has 4 observations and 3 variables:

array([[1, 1, 0],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 0]])

Creating B would yield:

array([[3, 2, 0],
       [2, 3, 1],
       [0, 1, 1]])

What would be a good way to go about this? Thank you.

Alexander · Accepted Answer

You can use matmul to multiply the transpose of matrix A by matrix A, resulting in your desired answer.

a = np.array(
    [[1, 1, 0],
     [1, 1, 0],
     [0, 1, 1],
     [1, 0, 0]]
)

>>> np.matmul(a.T, a)
array([[3, 2, 0],
       [2, 3, 1],
       [0, 1, 1]])

Numpy, Frequency of variable pairs in observations

Answers (1)

Related Questions