Mario Toffia
Mario Toffia

Reputation: 530

Partial Sort Performance Problems in go

I'm implementing a port, located at mariotoffia/goannoy, of Spotify/Annoy library to go. I've got problems in regards to sorting, especially partial sort (partial as in c++ std::partial_sort<>) performance.

Is there anyone that can point me to a direction (or even better a library), that implements partial sort in golang that do have good performance?

This is what, I've come up with:

The struct containing the pair to use when sorting (it is used in my custom priority queue)

type Pair[T constraints.Ordered, S constraints.Ordered] struct {
    First  T
    Second S
}

func (p *Pair[T, S]) Less(other *Pair[T, S]) bool {
    return p.First < other.First ||
        (p.First == other.First && p.Second < other.Second)
}

type Pairs[T constraints.Ordered, S constraints.Ordered] []*Pair[T, S]

func (pq Pairs[_, _]) Len() int {
    return len(pq)
}

func (pq Pairs[_, _]) Less(i, j int) bool {
    return pq[i].First < pq[j].First ||
        (pq[i].First == pq[j].First && pq[i].Second < pq[j].Second)
}

func (pq Pairs[_, _]) Swap(i, j int) {
    pq[i], pq[j] = pq[j], pq[i]
}

func (pq *Pairs[TP, TV]) Push(x interface{}) {
    *pq = append(*pq, x.(*Pair[TP, TV]))
}

func (pq *Pairs[_, _]) Pop() interface{} {
    old := *pq
    n := len(old)
    item := old[n-1]
    *pq = old[:n-1]
    return item
}

func (pq *Pairs[TP, TV]) Top() *Pair[TP, TV] {
    if len(*pq) == 0 {
        return nil
    }

    return (*pq)[0]
}

This is the implementation of the actual partition sort:

func PartialSortSlice[TV interfaces.VectorType, TIX interfaces.IndexTypes](
    s []*Pair[TV, TIX],
    begin, middle, end int,
) {
    if begin >= end || middle <= begin || middle > end {
        return
    }

    // Find the N smallest elements
    N := middle - begin

    if end-begin > 20 && end-begin < 5000000 {
        SortPairs(s)
        return
    }

    for i := 0; i < N; i++ {
        minIndex := begin + i

        // Find the index of the smallest element in the unsorted part
        for j := begin + i + 1; j < end; j++ {
            if s[j].Less(s[minIndex]) {
                minIndex = j
            }
        }

        // Swap elements
        if minIndex != begin+i {
            s[begin+i], s[minIndex] = s[minIndex], s[begin+i]
        }
    }

    // Sort sub-range [begin, middle)
    if N > 15 {
        SortPairs(s[begin:middle])
    } else {
        for i := begin + 1; i < middle; i++ {
            for j := i; j > begin && s[j].Less(s[j-1]); j-- {
                s[j], s[j-1] = s[j-1], s[j]
            }
        }
    }
}

The limits for using the utils.SortPairs(...) is done by performance benchmarks, iteratively and manually use the limits...

The utils.SortPairs(...) uses the standar sort:

func SortPairs[TV interfaces.VectorType, TIX interfaces.IndexTypes](
    pairs []*Pair[TV, TIX],
) {
    sort.Slice(pairs, func(i, j int) bool {
        return pairs[i].Less(pairs[j])
    })
}

The other, I've came up is faster in certain situations, but it too slow when having lots of large arrays e.g. millions of elements (I'd guess hammering the go gc too hard).

func PartialSortSlice2[TV interfaces.VectorType, TIX interfaces.IndexTypes](
    s []*Pair[TV, TIX],
    begin, middle, end int,
) {
    if begin >= end || middle <= begin || middle > end {
        return
    }

    // Find the N smallest elements using a binary heap
    N := middle - begin
    h := pmnh[TV, TIX]{indices: make([]int, N), data: s}
    for i := 0; i < N; i++ {
        h.indices[i] = i + begin
    }
    heap.Init(&h)
    for i := N; i < end-begin; i++ {
        if s[begin+i].Less(s[h.indices[0]]) {
            h.indices[0] = i + begin
            heap.Fix(&h, 0)
        }
    }

    // Swap elements
    for i := 0; i < N; i++ {
        s[begin+i], s[h.indices[i]-begin] = s[h.indices[i]-begin], s[begin+i]
    }

    // Sort sub-range [begin, middle) in place
    for i := begin + 1; i < middle; i++ {
        for j := i; j > begin && s[j].Less(s[j-1]); j-- {
            s[j], s[j-1] = s[j-1], s[j]
        }
    }
}

(I'll probably incorporate the second one in the first one, when e.g. < 10000, it seems to perform better in those lower ranges...)

Is there any other solution, that does not allocate any memory (or very little/low frequent) and is really, really fast?

To see the poor performance in proof graph:

git clone https://github.com/mariotoffia/goannoy.git
cd goannoy
go run cmd/precision/main.go -file -items 1000000 -prec 1000 -cpu-profile
cat results.txt

( use the -mem-profile for memory)*

Upvotes: 0

Views: 162

Answers (1)

craigb
craigb

Reputation: 1091

You need to clear about what you mean by a "partial" sort. For example, if First is a float64, does "partial" mean that you don't care about the precise order if First differs by less than some tolerance?

If so, and the max/min values of First are reasonably bounded, you could simply create bins of similar First values using a slice of slices of *Pair. The first slice would have length:

(firstMax - firstMin) / delta

where delta is the tolerance you want. Then you simply append every *Pair to the relevant slice using an index like:

(p.First - firstMin) / ((firstMax - firstMin) * delta)

(you might need to make the slice longer by 1, you could round the result, and/or add some bounds checking).

This groups the Pair into bins that have First values within delta. Each slice of similar Pair entries is unsorted. It makes the partial sort O(n) instead of O(n) * log(n).

tl;dr: this is a bucket sort without the 2nd phase of sorting the subarrays.

Upvotes: 0

Related Questions