user9472878
user9472878

Reputation:

Check similarity of array values

I got an array that with different values and I'd like to calculate a percentage value that symbolizes the similarity of all it's elements using maybe a threshold property.

An array could look like this:

var array = [42.98, 42.89, 42.91, 42.98, 42.88] // should return nearly 100%

var array = [42.98, 22.89, 42.91, 42.98, 42.88] // should return maybe 80%

var array = [42.98, 332.89, 122.91, 5512.98, -12.88] // should return nearly 0%

So 100% stands if all elements are the same ... and 0 % stands for the case the elements are way different. The adjustment is set by editing the threshold

I do not really know how to solve the problem (I'm an absolutely newbie) - however this is all I got so far and obviously it is not working that way:

function checkSimilarity(array, threshold) {
    var sum = array.reduce((a, b) => a + b, 0),
        percentage = 0;
    for (var i =0; i< array.length; i++) {
       var diff = (sum / array.length) * i
       percentage += diff

    }
    return percentage * (threshold/100)
}

Any help how to solve my problem of creating a working algorithm would be very appreciated!

Upvotes: 1

Views: 1925

Answers (3)

Jack Dalton
Jack Dalton

Reputation: 3681

var array1 = [42.98, 42.89, 42.91, 42.98, 42.88] // should return nearly 100%
var array2 = [42.98, 22.89, 42.91, 42.98, 42.88] // should return maybe 80%
var array3 = [42.98, 332.89, 122.91, 5512.98, -12.88] // should return nearly 0%

function calculateRange(data) {
	var disimilarity;
	var sum = data.reduce((a, b) => a + b, 0);
  var mean = sum / data.length
	
  // loop through passed array
  data.forEach(function(item, idx) {
  	
    // calculate percentage diff from mean
  	var percentageDiff = 100 - (item / mean * 100)
    
    // insure value is always positive
    if (percentageDiff < 0) {
        percentageDiff = -percentageDiff;
    }
    
    // mean aggrigate the diff value
    if(disimilarity) {
        disimilarity = (disimilarity + percentageDiff) / 2
    } else {
    	disimilarity = percentageDiff
    }
    
  })
    
   // subtract mean disimiliarty from 100%
   return 100 - disimilarity;
}

var array1DOM = document.getElementById("array1")
var array2DOM = document.getElementById("array2")
var array3DOM = document.getElementById("array3")

array1DOM.innerHTML = calculateRange(array1)
array2DOM.innerHTML = calculateRange(array2)
array3DOM.innerHTML = calculateRange(array3)
<div>
    <div id="array1"></div>
    <div id="array2"></div>
    <div id="array3"></div>
</div>

This solution in simple terms is aggregating the percentage difference from the mean value of the data set to determine accuracy. You will notice that the first two arrays give answers in the nearly 100% and 80% as requested. The issue arises with the final array. As this model is based on variation from the mean, the lack of correlation between values in array3 leads to such a high dissimilarity score that it is a negative value.

I cannot resolve this issue as I cannot guess what your maximum difference value is. If that value is known I can normalise values using it such that the range returned is 0 - 100. If you can never know the maximum difference, the only potential solutions I can suggest are :

  • Using my method as is, and noting the lower the score the less similar it is (in theory it can go on for a long time)
  • Flooring anything below 0 to 0
  • Calculating several data sets and then using the lowest scoring one as your 0, and the highest as your 100. That way you have a relative degree of similarity between sets.
  • Estimating what the highest level of dissimilarity could be and pass it into the function. ie what is the minimum array value, or maximum array value you will ever receive in this process.

If you could supply information on the purpose/context of this task we may be able to specify more.

Upvotes: 2

wisn
wisn

Reputation: 1024

I'm using Euclidean distance for this problem. However, I don't know this will satisfy your problem or not.

const similarity = list => {
  if (list.length < 1) return 0;
  if (list.length < 2) return 100;
  
  let listPair = [];
  for (let i = 0; i < list.length - 1; i++)
    listPair.push({ a: list[i], b: list[i + 1] });
  
  const sum = listPair.reduce((acc, { a, b }) => acc + Math.pow(a - b, 2), 0);
  
  const calculation = 100 - Math.sqrt(sum);
  
  return calculation < 0 ? 0 : calculation;
};

let list = [];
console.log(similarity(list)); // 0%

list = [42.98, 42.89, 42.91, 42.98, 42.88];
console.log(similarity(list)); // ~99%

list = [42.98, 22.89, 42.91, 42.98, 42.88];
console.log(similarity(list)); // ~71%

list = [10, 10, 10, 20, 10];
console.log(similarity(list)); // ~85%

list = [42.98, 332.89, 122.91, 5512.98, -12.88];
console.log(similarity(list)); // 0%

list = [45.51, 45.51, 45.51, 45.51, 45.51];
console.log(similarity(list)); // 100%

list = [10];
console.log(similarity(list)); // 100%

Upvotes: 0

Geuis
Geuis

Reputation: 42277

Slightly different approach. By no means meant to be the most efficient, but it does work for your sample data.

https://codepen.io/anon/pen/RMWjRL?editors=0010

const array1 = [42.98, 42.89, 42.91, 42.98, 42.88]; // should return nearly 100%
const array2 = [42.98, 22.89, 42.91, 42.98, 42.88]; // should return maybe 80%
const array3 = [42.98, 332.89, 122.91, 5512.98, -12.88]; // should return nearly 0%

const similarity = (arr) => {
  const dict = {};

  arr.forEach(item => {
    const val = Math.round(item);
    dict[val] ? dict[val]++ : dict[val] = 1;
  });

  let largest = 1;

  Object.keys(dict).forEach(key => largest = dict[key] > largest ? dict[key] : largest);

  return largest / arr.length;
};

console.log(similarity(array1)); // 1
console.log(similarity(array2)); // 0.8
console.log(similarity(array3)); // 0.2

Upvotes: 1

Related Questions