Reputation: 391
I'm trying to sort two C++ vectors in Cython, one by the contents of itself and one by the contents of the first vector.
cimport cython
from libcpp.vector cimport vector
from libcpp.algorithm cimport sort as stdsort
def function():
cdef vector[np.npy_float] distances
cdef vector[np.npy_intp] indices
d = [9., 8., 3., 2., 3.]
for i in range(len(d)):
distances.push_back(d[i])
indices.push_back(i)
stdsort(distances.begin(), distances.end())
// distances = [2.0, 3.0, 3.0, 8.0, 9.0]
// Sort indices by distances?
return distances, indices
I know in pure C++ you can easily do this with an object that houses both the distance and the index and having a custom sorting function for that object, but what's an easy way to do this in Cython?
Upvotes: 5
Views: 552
Reputation: 30917
I'd be tempted to take the slightly simpler approach of creating a single array encompassing both scores and indices then sorting that. The downside is that it involves a little more copying that @ead's answer.
For this std::pair
works pretty well (since it already defines the operators you need) and can readily be accessed from Cython.
# distutils: language = c++
from libcpp.vector cimport vector
from libcpp.utility cimport pair
from libcpp.algorithm cimport sort as stdsort
def function():
cdef vector[pair[float, int]] di
cdef vector[float] distances
cdef vector[int] indices
d = [9., 8., 3., 2., 3.]
for i in range(len(d)):
di.push_back(pair[float, int](d[i], i))
stdsort(di.begin(), di.end())
for di_pair in di:
distances.push_back(di_pair.first)
indices.push_back(di_pair.second)
return distances, indices
For more advanced cases you might have to define a custom struct or class with its own comparators rather than using std::pair
. In these cases I wouldn't get too hung up on using Cython for everything - it's generally easier to write C++ in C++. However for this case you don't need to.
Upvotes: 0
Reputation: 34337
For getting sorting indices in C++, one usually would not create a new vector with index-value pairs, but choose a comparator, which would achieve the same without actually copying memory.
Translating this to Cython would look as follows:
%%cython -+ -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
"""
#include <algorithm>
#include <vector>
void sort_via_score(std::vector<int>& indices, const std::vector<double>& scores){
std::sort(indices.begin(), indices.end(),
[&scores](int i, int j){return scores.at(i)<scores.at(j);}
);
}
"""
void sort_via_score(vector[int]& indices, vector[double]& scores)
def sort_indices(lst):
cdef vector[double] scores = lst
cdef vector[int] indices = range(len(lst))
sort_via_score(indices, scores)
return indices
The function sort_indices
is a wrapper which allows us to check the implementation quickly:
sort_indices([5,4,3,2,1])
# [4, 3, 2, 1, 0] as expected
sort_via_score
works similar to the following one-liner in Python:
def sort_indices_py(scores):
return sorted(range(len(scores)), key=lambda x: scores[x])
the scores
-vector is used in the closure to look-up the score of the index . There are no new objects created which would put index and its score together in memory - they are combined by logic of the key
-function alone.
The solution above uses verbatim C-code, because it is so much easier to write C++ code with C++ than with Cython.
If one really wants to stick to "pure" Cython (I don't recomend), so it is possible to emulate the C++-closures with the following code:
%%cython -+
from libcpp.vector cimport vector
from libcpp.algorithm cimport sort as stdsort
cdef vector[double]* vec
cdef bint comp_fun(int i, int j):
return vec.at(i)<vec.at(j)
def sort_indices2(lst):
cdef vector[double] scores = lst
cdef vector[int] indices = range(len(lst))
global vec
vec = &scores # "global closure"
stdsort(indices.begin(), indices.end(), comp_fun)
return indices
Upvotes: 3