Reputation: 1032
I'm a total beginner at using Cython, but I've been following a few introductory tutorials in an attempt to optimize some Python code. I've defined types for all my variables, but cython --annotate
is showing much more Python interaction than expected. Here's the output, and for those of you who can't view images, the code as well:
+148: cdef int[:] get_hp_lengths(str seq):
+149: hp_lens_buf = np.zeros(len(seq), dtype=np.intc)
+150: cdef int[:] hp_lens = hp_lens_buf
151: cdef Py_ssize_t start, stop
152:
+153: for start in range(len(seq)):
+154: for stop in range(start+1, len(seq)):
+155: if seq[stop] != seq[start]:
+156: hp_lens[start] = stop - start
+157: break
+158: if len(seq):
+159: hp_lens[-1] += 1
+160: return hp_lens
Is this because I'm using numpy
, and it should be fast enough already? It looks like str
is also causing Python interaction. Is there a better data type to use, given that I don't know the length of seq
a priori, but it will contain only 'ACGT'
? Basically, I want to verify that I'm following best practices, and there's not a faster way to implement this.
Upvotes: 0
Views: 156
Reputation: 1703
This is probably the best you can do with your code:
c_string
Here is the edited code
cimport cython
@cython.boundscheck(False)
cdef int[::1] get_hp_lengths(const unsigned char[::1] seq):
cdef int seq_len = seq.shape[0]
cdef int[::1] hp_lens = np.zeros(seq_len, dtype= np.intc)
cdef size_t start, stop
for start in range(seq_len):
for stop in range(start+1, seq_len):
if seq[stop] != seq[start]:
hp_lens[start] = stop - start
break
if seq_len > 0:
hp_lens[seq_len] += 1
return hp_lens
Which produces the following html
which looks pretty good!
Upvotes: 1
Reputation: 7157
Most of the python interaction is due to the fact that len
is a python function and that you call it multiple times. However, it only needs to be calculated once. You can further reduce the python interaction by disabling the index checks for hp_lens[start]
by means of the boundscheck
compiler directive:
from cython cimport boundscheck
@boundscheck(False)
cdef int[:] get_hp_lengths(str seq):
cdef int seq_len = len(seq)
hp_lens_buf = np.zeros(seq_len, dtype=np.intc)
cdef int[:] hp_lens = hp_lens_buf
cdef Py_ssize_t start, stop
for start in range(seq_len):
for stop in range(start+1, seq_len):
if seq[stop] != seq[start]:
hp_lens[start] = stop - start
break
if seq_len > 0:
hp_lens[-1] += 1
return hp_lens
Upvotes: 3