Reputation: 235
I want to compile a python function with cython, for reading a binary file skipping some records (without reading the whole file and then slicing, as I would run out of memory). I can come up with something like this:
def FromFileSkip(fid, count=1, skip=0):
if skip>=0:
data = numpy.zeros(count)
k = 0
while k<count:
try:
data[k] = numpy.fromfile(fid, count=1, dtype=dtype)
fid.seek(skip, 1)
k +=1
except ValueError:
data = data[:k]
break
return data
and then I can use the function like this:
f = open(filename)
data = FromFileSkip(f,...
However, for compiling the function "FromFileSkip" with cython, I would like to define all the types involved in the function, so "fid" as well, the file handler. How can I define its type in cython, as it is not a "standard" type, e.g. an integer. Thanks.
Upvotes: 4
Views: 4610
Reputation: 1303
Defining the type of fid
won't help because calling python functions is still costly. Try compiling your example with "-a" flag to see what I mean. However, you can use low-level C functions for file handling to avoid python overhead in your loop. For the sake of example, I assumed that the data starts right from the beginning of the file and that its type is double
from libc.stdio cimport *
cdef extern from "stdio.h":
FILE *fdopen(int, const char *)
import numpy as np
cimport numpy as np
DTYPE = np.double # or whatever your type is
ctypedef np.double_t DTYPE_t # or whatever your type is
def FromFileSkip(fid, int count=1, int skip=0):
cdef int k
cdef FILE* cfile
cdef np.ndarray[DTYPE_t, ndim=1] data
cdef DTYPE_t* data_ptr
cfile = fdopen(fid.fileno(), 'rb') # attach the stream
data = np.zeros(count).astype(DTYPE)
data_ptr = <DTYPE_t*>data.data
# maybe skip some header bytes here
# ...
for k in range(count):
if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0:
break
if fseek(cfile, skip, SEEK_CUR):
break
return data
Note that the output of cython -a example.pyx
shows no python overhead inside the loop.
Upvotes: 5