Reputation: 391

Cython: How to sort the contents of one vector by another vector?

I'm trying to sort two C++ vectors in Cython, one by the contents of itself and one by the contents of the first vector.

cimport cython
from libcpp.vector cimport vector
from libcpp.algorithm cimport sort as stdsort

def function():

    cdef vector[np.npy_float] distances
    cdef vector[np.npy_intp] indices

    d = [9., 8., 3., 2., 3.]
    for i in range(len(d)):
        distances.push_back(d[i])
        indices.push_back(i)

    stdsort(distances.begin(), distances.end())
    // distances = [2.0, 3.0, 3.0, 8.0, 9.0]
    // Sort indices by distances?

    return distances, indices

I know in pure C++ you can easily do this with an object that houses both the distance and the index and having a custom sorting function for that object, but what's an easy way to do this in Cython?

Upvotes: 5

Answers (2)

DavidW

Reputation: 30917

I'd be tempted to take the slightly simpler approach of creating a single array encompassing both scores and indices then sorting that. The downside is that it involves a little more copying that @ead's answer.

For this std::pair works pretty well (since it already defines the operators you need) and can readily be accessed from Cython.

# distutils: language = c++

from libcpp.vector cimport vector
from libcpp.utility cimport pair
from libcpp.algorithm cimport sort as stdsort

def function():
    cdef vector[pair[float, int]] di
    cdef vector[float] distances
    cdef vector[int] indices
    

    d = [9., 8., 3., 2., 3.]
    for i in range(len(d)):
        di.push_back(pair[float, int](d[i], i))

    stdsort(di.begin(), di.end())
    
    for di_pair in di:
        distances.push_back(di_pair.first)
        indices.push_back(di_pair.second)

    return distances, indices

For more advanced cases you might have to define a custom struct or class with its own comparators rather than using std::pair. In these cases I wouldn't get too hung up on using Cython for everything - it's generally easier to write C++ in C++. However for this case you don't need to.

Upvotes: 0

ead

Reputation: 34337

For getting sorting indices in C++, one usually would not create a new vector with index-value pairs, but choose a comparator, which would achieve the same without actually copying memory.

Translating this to Cython would look as follows:

%%cython -+ -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
    """
    #include <algorithm>
    #include <vector>
    void sort_via_score(std::vector<int>& indices, const std::vector<double>& scores){
        std::sort(indices.begin(), indices.end(),
                  [&scores](int i, int j){return scores.at(i)<scores.at(j);}
                 );
    }
    """
    void sort_via_score(vector[int]& indices, vector[double]& scores)
    
def sort_indices(lst):
    cdef vector[double] scores = lst
    cdef vector[int] indices = range(len(lst))
    sort_via_score(indices, scores)
    return indices

The function sort_indices is a wrapper which allows us to check the implementation quickly:

sort_indices([5,4,3,2,1])
# [4, 3, 2, 1, 0] as expected

sort_via_score works similar to the following one-liner in Python:

def sort_indices_py(scores):
    return sorted(range(len(scores)), key=lambda x: scores[x])

the scores-vector is used in the closure to look-up the score of the index . There are no new objects created which would put index and its score together in memory - they are combined by logic of the key-function alone.

The solution above uses verbatim C-code, because it is so much easier to write C++ code with C++ than with Cython.

If one really wants to stick to "pure" Cython (I don't recomend), so it is possible to emulate the C++-closures with the following code:

%%cython -+
from libcpp.vector cimport vector
from libcpp.algorithm cimport sort as stdsort

cdef vector[double]* vec
cdef bint comp_fun(int i, int j):
    return vec.at(i)<vec.at(j)

def sort_indices2(lst):
    cdef vector[double] scores = lst
    cdef vector[int] indices = range(len(lst))
    global vec
    vec = &scores # "global closure"
    stdsort(indices.begin(), indices.end(), comp_fun)
    return indices

Upvotes: 3

Cython: How to sort the contents of one vector by another vector?

Answers (2)

Related Questions