Poor Cython performance with typed MemoryView

Question

I am trying to speed up some pure-Python code using Cython. Here is the original Python code:

import numpy as np
def image_to_mblocks(image_component):
    img_shape = np.shape(image_component)
    v_mblocks = img_shape[0] // 16
    h_mblocks = img_shape[1] // 16
    x = image_component
    x = [x[i * 16:(i + 1) * 16:, j * 16:(j + 1) * 16:] for i in range(v_mblocks) for j in range(h_mblocks)]
    return x

The argument image_component is a 2-dimensional numpy.ndarray, where the length of each dimension is evenly divisible by 16. In pure Python, this function is fast--on my machine, 100 calls with image_component of shape (640, 480) takes 80 ms. However, I need to call this function on the order of thousands to tens of thousands of times, so I am interested in speeding it up.

Here is my Cython implementation:

import numpy as np
cimport numpy as np
cimport cython
ctypedef unsigned char DTYPE_pixel

cpdef np.ndarray[DTYPE_pixel, ndim=3] image_to_mblocks(unsigned char[:, :] image_component):

    cdef int i
    cdef int j
    cdef int k = 0
    cdef int v_mblocks = image_component.shape[0] / 16
    cdef int h_mblocks = image_component.shape[1] / 16
    cdef np.ndarray[DTYPE_pixel, ndim=3] x = np.empty((v_mblocks*h_mblocks, 16, 16), dtype=np.uint8)

    for j in range(h_mblocks):
        for i in range(v_mblocks):
            x[k] = image_component[i * 16:(i + 1) * 16:, j * 16:(j + 1) * 16:]
            k += 1
    return x

The Cython implementation uses a typed MemoryView in order to support slicing of image_component. This Cython implementation takes 250 ms on my machine for 100 iterations (same conditions as before: image_component is a (640, 480) array).

Here is my question: in the example I've given, why does Cython fail to outperform the pure Python implementation?

I believe I've followed all the steps in the Cython documentation for working with numpy arrays, but I've failed to achieve the performance boost that I was expecting.

For reference, here is what my setup.py file looks like:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy

extensions = [
    Extension('proto_mpeg_computation', ['proto_mpeg_computation.pyx'],
          include_dirs=[numpy.get_include()]
          ),
]

setup(
   name = "proto_mpeg_x",
   ext_modules = cythonize(extensions)
)

Poor Cython performance with typed MemoryView

Answers (1)

Related Questions