Calculate cosine similarity and output without duplicates?

Question

I have the following vectors in my toy example:

data = pd.DataFrame({
            'id': [1, 2, 3, 4, 5],
            'a': [55, 2123, -19.3, 9, -8], 
            'b': [21, -0.1, 0.003, 4, 2.1]
        })

I have calculated similarity matrix (by excluding the id column).

from sklearn.metrics.pairwise import cosine_similarity

# Calculate the pairwise cosine similarities 
S = cosine_similarity(data.drop('id', axis=1))

T  = S.tolist()
df = pd.DataFrame.from_records(T)

It returns me a matrix/dataframe with all options including self similarity and duplicates. Is there any efficient method to calculate similarity without self similarities (vector is 100% similar to itself) and duplicates (vectors 1 and 2 has 89% similarity, I don't need vectors 2 and 1 similarity as it's the same).

Calculate cosine similarity and output without duplicates?

Answers (1)

Related Questions