kitchenprinzessin
kitchenprinzessin

Reputation: 1043

How to compute similarities based on co-occurrence matrix?

I have an item-item matrix (1877 x 1877). The values in the matrix represent the number of times two items occurred together. How can I determine the similarities between two items? Through reading, i found few options. However i am not sure about these approaches. Any inputs to get started is appreciated.

  1. Use cosine to compute sim between two vectors
  2. Turn this into a graph, use measures like simrank to compute similarity - may use the occurrence count as a weight between two nodes.

Upvotes: 5

Views: 4544

Answers (3)

Hamed Baziyad
Hamed Baziyad

Reputation: 2019

If your co-nonoccurence matrix is symmetrical, you don't need to normalize it. You can refer to this paper for gain more information about normalization of symmetrical and asymmetrical co-matrices: Leydesdorff, L. and Vaughan, L., 2006. Co‐occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the American Society for Information Science and technology, 57(12), pp.1616-1628. please, click hear

Upvotes: 1

Nico
Nico

Reputation: 792

I would recommend using spatial cosine similarity. Alternatively you could calculate jaccard's similarity for each item pair.

After calculating either similarity matrix (affinity matrix) you can use a spectral (or spatial) clustering algorithm, such as sklearn's spectral clustering algorithm to group those items.

Upvotes: 3

mbednarski
mbednarski

Reputation: 798

You can thread it as 1877 items with 1877 features each. If two items are similar, than they co-occurrences will be similar. Given that you might use NearestNeighbors in order to find closest one. There are may metrics available.

Also, reprocessing the data may help you. I do not know it's distribution but you might want to normalize values into range [0;1] or doing sth like that.

Upvotes: 1

Related Questions