Richard Hoffman
Richard Hoffman

Reputation: 139

Best way to check if two arrays have the same members

I have an array of strings, I need to compare this to another array of strings, but they may be in a different order. What's the best way to compare the two arrays?

This is what I have so far, just wondering if there is a simpler / more efficient way I'm missing.

func unorderedEqual(first, second []string) bool {
    if len(first) != len(second) {
        return false
    }
    for _, value := range first {
        if !contains(value, second) {
            return false
        }
    }
    return true
}

func contains(needle string, haystack []string) bool {
    for _, matchValue := range haystack {
        if matchValue == needle {
            return true
        }
    }
    return false
}

Upvotes: 3

Views: 12695

Answers (5)

hairycoo
hairycoo

Reputation: 101

Ideally, this would have been a comment, but the TLDR is it is both more correct and faster to use Husain's sort and compare.

Details. For anyone looking at the RayfenWindspear answer above (currently highest ranked), at first glance it looks right, but be aware that it does not check for exact equal-ness, but rather only that every element in the second list is in the first list. The converse also needs to be true, but is not checked by this method. Hence it fails this test (bb is repeated).:

assert.False(t, unorderedEqual([]string{"aa", "cc", "bb"}, []string{"aa", "bb", "bb"}))

Of course you can just run it twice to get the correct result, which is only a linear factor

func DoubleUnorderedEqual(a, b []string) bool {
    return unorderedEqual(a, b) && unorderedEqual(b, a)
}

The suggestion for the sort-then-check by Husain should possibly be ranked higher, because it is correct, and is also faster for larger lists.

Here's Husain's code in an exported function:

func SortCompare(a, b []string) bool {
    if len(a) != len(b) {
        return false
    }

    sort.Strings(a)
    sort.Strings(b)

    for i, v := range a {
        if v != b[i] {
            return false
        }
    }
    return true
}

And some tests you can run on it (which it passes)

func TestSortCompare(t *testing.T) {

    assert.True(t, SortCompare([]string{"aa"}, []string{"aa"}))
    assert.False(t, SortCompare([]string{"aa"}, []string{"bb"}))
    assert.False(t, SortCompare([]string{"aa"}, []string{"bb", "cc"}))
    assert.True(t, SortCompare([]string{"cc", "bb"}, []string{"bb", "cc"}))
    assert.True(t, SortCompare([]string{"aa", "cc", "bb"}, []string{"aa", "bb", "cc"}))
    assert.False(t, SortCompare([]string{"aa", "cc", "bb"}, []string{"aa", "bb", "bb"}))
}

And here is some sample benchmarking ....

func getStrings() ([]string, []string) {

    a := []string{"a", "foo", "bar", "ping", "pong"}
    b := []string{"pong", "foo", "a", "bar", "ping"}

    return a, b

}

func BenchmarkSortCompare(b *testing.B) {
    s0, s1 := getStrings()
    var outcome bool
    for n := 0; n < b.N; n++ {
        outcome = SortCompare(s0, s1)
    }
    fmt.Println(outcome)
}

func BenchmarkDoubleUnorderedEqual(b *testing.B) {
    s0, s1 := getStrings()
    var outcome bool
    for n := 0; n < b.N; n++ {
        outcome = DoubleUnorderedEqual(s0, s1)
    }
    fmt.Println(outcome)
}

With the results ...

BenchmarkSortCompare-32                  2637952           498 ns/op
BenchmarkDoubleUnorderedEqual-32         3060261           381 ns/op

So running the map method twice is slightly faster at this small size... but add a few more strings and the sort method wins out by a factor of 10. I did not take into account the impact of the degree of disorder in the strings but they are sufficiently disordered it is not an obviously unfair test at first glance.

func getStrings2() ([]string, []string) {

    a := []string{"a", "foo", "bar", "ping", "pong", "b", "c", "g", "e", "f", "d", "h", "i", "j", "q", "l", "n", "o", "p", "k", "r", "s", "t", "u", "v", "w", "x", "y", "z"}
    b := []string{"pong", "foo", "a", "bar", "ping", "p", "r", "q", "s", "u", "t", "v", "j", "x", "y", "z", "b", "e", "d", "c", "h", "g", "f", "i", "w", "k", "l", "n", "o"}

    return a, b

}

func BenchmarkSortCompare2(b *testing.B) {
    s0, s1 := getStrings2()
    var outcome bool
    for n := 0; n < b.N; n++ {
        outcome = SortCompare(s0, s1)
    }
    fmt.Println(outcome)
}

func BenchmarkDoubleUnorderedEqual2(b *testing.B) {
    s0, s1 := getStrings2()
    var outcome bool
    for n := 0; n < b.N; n++ {
        outcome = DoubleUnorderedEqual(s0, s1)
    }
    fmt.Println(outcome)
}

The results:

BenchmarkSortCompare2-32                  454641          2797 ns/op
BenchmarkDoubleUnorderedEqual2-32          44420         26714 ns/op

Conclusion - I'll be using Husain's sort and compare ....

Upvotes: 5

RayfenWindspear
RayfenWindspear

Reputation: 6274

Given that you are doing a length check, I'm going to go with the assumption that implies that they are 1:1, just ordered differently.

You can do this in one pass (each) using a map[string]bool to check existence in both. This utilizes the fact that the map returns the zero value of a bool, which is false, when the key is not present.

Disclaimer: Technically this is order O(n)*O(map). The Go Programming Language Specification does not make any performance guarantees for map types.

https://play.golang.org/p/2LUjN5LkXLL

func unorderedEqual(first, second []string) bool {
    if len(first) != len(second) {
        return false
    }
    exists := make(map[string]bool)
    for _, value := range first {
        exists[value] = true
    }
    for _, value := range second {
        if !exists[value] {
            return false
        }
    }
    return true
}

If you want to get nit-picky about memory usage, you could save yourself storing a bunch of bools (which is usually negligible, but to each their own) by using a map[string]struct{} (the empty struct), and you just check existence a little differently, as in this example.

https://play.golang.org/p/MjwP_wCtYZV

Set

exists[value] = struct{}{}

Check

if _, ok := exists[value]; !ok {
    return false
}

Upvotes: 7

azrahel
azrahel

Reputation: 1213

Generic, language agnostic:

  1. sort both with fastest available algorithm
  2. iterate over table A and compare with B[currentIndexFromA]
  3. first time you spot difference, you know they hold different data - throw!
  4. you iterated over whole A? - they are the same

I don't know GO, but you seem to be naively searching for each element from A in B. In worst case scenario you get many many iterations over B. Sorting with performant algorithm seems to be way more efficient even though it's additional operation.

I unfortunately will not provide code sample, as I don't know GO.

Upvotes: 1

Husain
Husain

Reputation: 845

You can sort the arrays first and then check by index:

package main

import (
    "fmt"
    "sort"
)

func main() {
    s1 := []string{"first", "second", "third"}
    s2 := []string{"third", "first", "second"}

    if len(s1) != len(s2) {
        fmt.Println("not equal")
    }
    sort.Strings(s1)
    sort.Strings(s2)
    for i := 0; i < len(s1); i++ {
        if s1[i] != s2[i] {
            fmt.Println("not equal")
            return
        }
    }
    fmt.Println("equal")
}

Or in playground.

The idea with sorting is that it makes the comparison easier and faster. Sorting and then comparing index-wise is O(n log n), while 2-loop checking takes O(n^2).

Upvotes: 1

user4466350
user4466350

Reputation:

Using this dedicated package, you would have to use []interface{} instead of []string then proceed as such

package main

import (
    "fmt"

    "github.com/deckarep/golang-set"
)

func main() {
    some := []interface{}{"a", "b", "c"}
    other := []interface{}{"a", "b", "d"}
    fmt.Println(
        mapset.NewThreadUnsafeSetFromSlice(some).
            Difference(mapset.NewThreadUnsafeSetFromSlice(other)).Cardinality() == 0,
    )
    // Output: false
}

Upvotes: 0

Related Questions