modesitt
modesitt

Reputation: 7210

Percent Similarity of an Array Swift

Say I have two arrays:

var arrayOne = ["Hi", "Hello", "Hey", "Howdy"]
var arrayOne = ["Hi", "Hello", "Hey", "Not Howdy"]

What could I do to compare how similar the array elements are? As in a function that would return 75% Because the first three elements are the same but the last element are not. The arrays I'm using in my project are strings but they will almost entirely match except for a few elements. I need to see What percent the differences are. Any ideas?

Upvotes: 0

Views: 1539

Answers (4)

Antonio
Antonio

Reputation: 72760

A good way to measure the similarity of 2 arrays is to iterate all elements of an array, and keep a cursor on the 2nd array, such that at any time the current element of the iterated array is not greater than the element at the cursor position.

As you may argue, this algorithm require elements to be comparable, and as such it works if the arrays type implements the Comparable interface.

I've worked on a generic function that perform that calculation, here it is:

func compare<T: Comparable>(var lhs: [T], var rhs: [T]) -> (matches: Int, total: Int) {
    lhs.sort { $0 < $1 } // Inline sort
    rhs.sort { $0 < $1 } // Inline sort

    var matches = 0
    var rightSequence = SequenceOf(rhs).generate()
    var right = rightSequence.next()

    for left in lhs {
        while right != nil && left > right {
            right = rightSequence.next()
        }

        if left == right {
            ++matches
            right = rightSequence.next()
        }
    }

    return (matches: matches, total: max(lhs.count, rhs.count))
}

Let me say that the implementation can probably be optimized, but my goal here is to show the algorithm, not to provide its best implementation.

The first thing to do is to obtain a sorted version of each of the 2 arrays - for simplicity, I have declared both parameters as var, which allows me to edit them, leaving all changes in the local scope. That's way I am using in-place sort.

A sequence on the 2nd array is created, called rightSequence, and the first element is extracted, copied into the right variable.

Then the first array is iterated over - for each element, the sequence is advanced to the next element until the left element is not greater than the right one.

Once this is done, left and right are compared for equality, in which case the counter of matches is incremented.

The algorithm works for arrays having repetitions, different sizes, etc.

Upvotes: 0

Leo Dabus
Leo Dabus

Reputation: 236448

let arrayOne = ["Hi", "Hello", "Hey", "Howdy"]
let arrayTwo = ["Hi", "Hello", "Hey", "Not Howdy"]
var matches = 0
for (index, item) in enumerate(arrayOne) {
    if item == arrayTwo[index] {
        matches++
    }
}
Double(matches) / Double(arrayOne.count)   // 0.75

Upvotes: 1

Fonix
Fonix

Reputation: 11597

maybe something like this? (written off top of my head so havent checked if it actually compiles)

var arrayOne = ["Hi", "Hello", "Hey", "Howdy"]
var arrayTwo = ["Hi", "Hello", "Hey", "Not Howdy"]

var matches = 0

for i in 0...arrayOne.count { //assuming the arrays are always the same length
  if arrayOne[i] == arrayTwo[i]{
    matches++
  }
}

var percent = matches / arrayOne.count

Upvotes: 1

Adam Evans
Adam Evans

Reputation: 2082

Both of these algorithms use the idea that if you have two different length arrays, the highest similarity you can have is short length / long length, meaning that the difference in the array lengths are counted as not matching.

  1. You could add all of the terms to a set and then make your percentage the size of the set / length of longest array.

  2. You could sort both arrays and then do a loop with an index variable for each array and compare the values at the two indices, advancing the index for the array that has the "lower" value in the comparison, or increment a counter if they are equivalent. Your percentage would be the counter / length of longest array.

One thing to think about though is how you want to measure similarity in weird cases. Suppose you have two arrays: [1, 2, 3, 4, 5] and [1, 1, 1, 1, 1]. I don't know whether you would want to say they are completely similar, since all of the elements in the second array are in the first array, or if they only have a similarity of 20% because once the 1 in the first array is "used", it can't be used again.

Just some thoughts.

Upvotes: 1

Related Questions