user3735871
user3735871

Reputation: 317

Convert co-occurrence matrix to dissimilarity matrix for MDS in scikit-learn

I have a matrix of word co-occurrence, like below. I'd like to use MDS to reduce the dimension and plot it. In sklearn there's a function model = MDS(n_components=2, dissimilarity='precomputed', random_state=1) and to apply the model output = model.fit_transform(input)My understanding is the input should be a dissimilarity matrix instead of the similarity one that I have. Is that correct? Is there a function that I could use to convert this co-occurrence dissimilarity matrix? I'm quite new to this. Many thanks for your help.

co-occurrence matrix :
        word1       word2      word3 ...
word1.    0           1          3
word2     1           0          5
word3     3           5          1
...

Upvotes: 3

Views: 564

Answers (1)

Metalman
Metalman

Reputation: 93

It might be too late, but I might have an answer to propose.

I used a similarity matrix (full of 1 in the diagonale, which is not your case), and found a simple formula to transform it into a dissimilarity matrix: (1 - cell) However, my supervisor found another formula (I can't find back the reference) which seems to manage a diagonale with different values. I put some details in this thread, but my AWK program can't be applied to your data (as I simplified the formula to manage my case where I only have 1 in diagonale).

The formula which could work for you is :

(sii + si'i' - 2 * sii')^1/2

In my case, where the diagonale has 1, I simplified it to :

(2 - 2 * sii')^1/2

I hope it might help you ! :) But maybe I'm wrong. If that's the case, I'm interested to know the details.

Upvotes: 0

Related Questions