Reputation: 155

how to find percentage of similarity between two arrays

I have two data arrays x and y:

x = array([  0.,   0.,  84.,  80.,  59.,  22.,   0.,   0.,   0.,   0.,  52.,
       122., 117.,   1.,  10.,   0.,   0.,   0.,   0.,   0.,   0.,  92.,
        90.,  74.,  46.,   0.,   0.,   0.,   0.,  28., 121., 117.,  90.,
        54.,   0.,   0.,   0.,   0.,   0.,   0.,  47.,  62.,  54.,  57.,
        23.,  63.,  26.,  62.,  52., 138., 126.,  98.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,  19.,  44.,  74.,  89., 119.,
        77., 141., 137., 119.,   0.,   0.,   0.,   0.,  91., 115.,  89.,
       143., 146.,  45.,   0.,   0.,   0.,  65.,  89.,   1.,   0.,   0.,
         0.])

y = array([  0.,   0.,  79.,  90.,  64.,   3.,   0.,   0.,   0.,   0.,  19.,
       113., 109.,   1.,  25.,   0.,   0.,   0.,   0.,   0.,   0.,  90.,
        99.,  73.,  35.,   0.,   0.,   0.,   0.,  46., 106., 113., 105.,
        52.,   0.,   0.,   0.,   0.,   0.,   0.,  57.,  68.,  47.,  20.,
         0.,  17.,   1.,  14.,  48., 120., 118., 105.,   0.,   0.,   0.,
         0.,   0.,   0.,   4.,   1.,   0.,   0.,   0.,  42.,  47.,  80.,
        86., 125., 121., 111.,  16.,   0.,   0.,   0.,  47.,  72., 112.,
       123., 129.,  82.,   0.,   0.,   0.,  87.,  80.,   0.,   0.,   5.,
         0.])

I want to check the similarity between x and y in the program code. I've tried using SequenceMatcher() but I'm not sure about the similarity presentation results using that package. because when seeing the graph it has very similar, but the results of the presentation of the similarities are only 39.33%, so for me it's weird. is there another way to check the similarity between x and y data, if so, how and based on what kind of mathematical formula is used, thank you

my code for checking similarity using SequenceMatcher()

import difflib
from difflib import SequenceMatcher


sm=difflib.SequenceMatcher(None,x,y)
a = sm.ratio()*100
print('Similarity x and Testing y : ',round(a, 2),'%')

x and y graph:

Upvotes: 0

Answers (2)

taher almoussali

Reputation: 13

you can use this:

Cosine Similarity: Cosine similarity measures the cosine of the angle between two vectors. you can consider the matrices as flattened vectors and calculate the cosine similarity.

from sklearn.metrics.pairwise import cosine_similarity

x = np.array([  0.,   0.,  84.,  80.,  59.,  22.,   0.,   0.,   0.,   0.,  52.,
       122., 117.,   1.,  10.,   0.,   0.,   0.,   0.,   0.,   0.,  92.,
        90.,  74.,  46.,   0.,   0.,   0.,   0.,  28., 121., 117.,  90.,
        54.,   0.,   0.,   0.,   0.,   0.,   0.,  47.,  62.,  54.,  57.,
        23.,  63.,  26.,  62.,  52., 138., 126.,  98.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,  19.,  44.,  74.,  89., 119.,
        77., 141., 137., 119.,   0.,   0.,   0.,   0.,  91., 115.,  89.,
       143., 146.,  45.,   0.,   0.,   0.,  65.,  89.,   1.,   0.,   0.,
         0.])

y = np.array([  0.,   0.,  79.,  90.,  64.,   3.,   0.,   0.,   0.,   0.,  19.,
       113., 109.,   1.,  25.,   0.,   0.,   0.,   0.,   0.,   0.,  90.,
        99.,  73.,  35.,   0.,   0.,   0.,   0.,  46., 106., 113., 105.,
        52.,   0.,   0.,   0.,   0.,   0.,   0.,  57.,  68.,  47.,  20.,
         0.,  17.,   1.,  14.,  48., 120., 118., 105.,   0.,   0.,   0.,
         0.,   0.,   0.,   4.,   1.,   0.,   0.,   0.,  42.,  47.,  80.,
        86., 125., 121., 111.,  16.,   0.,   0.,   0.,  47.,  72., 112.,
       123., 129.,  82.,   0.,   0.,   0.,  87.,  80.,   0.,   0.,   5.,
         0.])

matrix1_flat = x.flatten()
matrix2_flat = y.flatten()

similarity_ratio = cosine_similarity([matrix1_flat], [matrix2_flat])[0][0]
print(similarity_ratio)

ouput: 0.9657650274258939

Upvotes: 0

cavalcantelucas

Reputation: 1382

Consider taking the Cross-Correlation function: https://en.wikipedia.org/wiki/Cross-correlation

Discussion: Computing cross-correlation function?

Numpy implementation: https://numpy.org/doc/stable/reference/generated/numpy.correlate.html

Upvotes: 1

how to find percentage of similarity between two arrays

Answers (2)

Related Questions