user1820686
user1820686

Reputation: 2117

Cython: declare list-like function parameter

I'm trying to create a simple cython module and have the following problem. I would like to create a function like:

cdef float calc(float[:] a1, float[:] a2):
    cdef float res = 0
    cdef int l = len(a2)
    cdef float item_a2
    cdef float item_a1

    for idx in range(l):
        if a2[idx] > 0:
            item_a2 = a2[idx]
            item_a1 = a1[idx]
            res += item_a2 * item_a1

    return res

When the function is being executed, a1 and a2 params are python lists. Therefore I get the error:

TypeError: a bytes-like object is required, not 'list'

I just need to make such calculations and nothing more. But how shall I define input params float[:] a1 and float[:] a2 if I need to maximize speed up using C? Probably it's necessary to convert lists to arrays manually?

P.S. would appreciate also if you can also explain to me whether it's necessary to declare cdef float item_a2 explicitly to perform multiplication (in terms of performance) or it is equally to result += a2[idx] * a1[idx]

Upvotes: 0

Views: 2770

Answers (2)

DavidW
DavidW

Reputation: 30884

cdef float calc(float[:] a1, float[:] a2):

a1 and a2 can be any object that supports the buffer protocol and has a float type. The most common examples would be either a numpy array or the standard library array module. They will not accept Python lists because a Python list is not a single homogeneous C type packed efficiently into memory, but instead a collection of Python objects.

To create a suitable object from a Python list you can do either:

numpy.array([1.0,2.0],dtype=numpy.float32)
array.array('f',[1.0,2.0])

(You may want to consider using double/float64 instead of float for extra precision, but that's your choice)

If you don't want to create array objects like this then Cython will not help you much since there is not much speed up possible with plain lists.

The np.ndarray[FLOAT, ndim=1] a1 syntax suggested in the other answer an outdated version of the memoryview syntax you're already using. There are no advantages (and a few small disadvantages) to using it.


result += a2[idx] * a1[idx]

is fine - Cython knows the types of a1 and a2 so there is no need to create temporary intermediate variables. You can get a html highlighted file with cython -a filename.pyx to inspect that will help indicate where the non-accelerated parts are.

Upvotes: 3

TayTay
TayTay

Reputation: 7170

Cython answer

One way you can do this (if you're open to using numpy):

import numpy as np
cimport numpy as np

ctypedef np.npy_float FLOAT
ctypedef np.npy_intp INTP

cdef FLOAT calc(np.ndarray[FLOAT, ndim=1, mode='c'] a1, 
                np.ndarray[FLOAT, ndim=1, mode='c'] a2):
    cdef FLOAT res = 0
    cdef INTP l = a2.shape[0]
    cdef FLOAT item_a2
    cdef FLOAT item_a1

    for idx in range(l):
        if a2[idx] > 0:
            item_a2 = a2[idx]
            item_a1 = a1[idx]
            res += item_a2 * item_a1

    return res

This will require a np.float32 dtype for your array. If you wanted a np.float64, you can redefine FLOAT as np.float64_t.

One unsolicited piece of advice... l is a bad name for a variable, since it looks like a digit. Consider renaming it length, or something of the like.

Pure python with Numpy

Finally, it looks like you're trying to compute the dot product between two vectors where elements in one array are positive. You could use Numpy here pretty efficiently to get the same result.

>>> import numpy as np
>>> a1 = np.array([0, 1, 2, 3, 4, 5, 6])
>>> a2 = np.array([1, 2, 0, 3, -1])
>>> a1[:a2.shape[0]].dot(np.maximum(a2, 0))
11

Note, I added the a1 slice since you didn't check for length equality in your Cython function, but used a2's length. So I assumed the lengths may differ.

Upvotes: 1

Related Questions