LeonM
LeonM

Reputation: 45

Cython: Memoryviewslice vs C array

I define a C function with memoryview input to work with a NumPy array, but a pure C-defined temporary float array can not work with 'base_func'. Error:

Operation not allowed without gil

How can I modify C function base_func to work with both the numpy.array and cdef C array?

cdef void base_func(float[:] vec1) noexcept nogil:
    return

def python_entry(vec: np.ndarray):
    cdef float[:] vec_view = vec
    base_func(vec_view)

cdef void cfunc(float[:] vec2) noexcept nogil:
    cdef float[10] tmp_vec
    base_func(tmp_vec)

Error Message

cdef void cfunc(float[:] vec2) noexcept nogil:
    cdef float[10] tmp_vec
    base_func(tmp_vec)              ^
------------------------------------------------------------

c_test.pyx:21:14: Operation not allowed without gil

Project Idea

I want to cythonize the GROUP BY operation on 1D or 2D np.ndarray. The python interface will be like group_mean(data, group), and group_mean = FuncWrapper(c_group_mean). So I can write other c-functions like c_group_std to implement another python interface group_std

Problems and Resolutions:

  1. Shape Alignment and NAN input: Randomly, data will be shaped as (m, n) and group as (n, ), I have to align them and use np.where to assign -1 to group where data is NAN.
  2. Work on 2D array: the c-func will only work on 1D input, so for 2D data, I use prange to operate on each row simultaneously, which needs NOGIL mode.
  3. Different Shaped Result: For (m, n) shaped input data, output could be shaped as (m, group_number) for statistical function such as group_mean, and as (m, n) for operation function such as group_demean (subtract corresponding group mean), so I have different FuncWrapper.
  4. Python Object FuncWrapper can't use c function pointer as parameter: create a CFuncWrapper (cdef class) to wrap c functions like c_group_mean, accordind to this post.
  5. Initialize temporary c array and reuse c function: For example, in c_group_mean, I need two arrays to sum up data and count in each group. So my c function template looks like following, which raise error on Windows 11 but works on macos. And also I can't call c_mean in c_std and pass in my temporary float array mean as result parameter
cdef void c_mean (float[:] data, int[:] group, float[:] result, const int length, const int group_number):
   float[group_num] sum_up
   int[group_num] count

So my final code should be,

group_mean = FuncWrapper(CFuncWrapper.bind_cfunc_group_num(c_group_mean))

Further Thinking

  1. Change all float[group_num] sum_up to float* sum_up = <float*>malloc(...), to work on Window 11?
  2. Change c function template to allow both memoryviewslice and float* to pass in? Or change the architecture to,
1. Python Interface: group_mean
2. FuncWrapper: pass in different base function, like c_group_mean
3. CFuncWrapper: make sure python object FuncWrapper can accept cdef functions
4. TODO: convert MemoryviewSlice to float*?
5. base c functions: c_group_mean, c_group_std

Upvotes: 1

Views: 111

Answers (0)

Related Questions