Reputation: 54068
I have a big block of Cython code that is parsing Touchstone files that I want to work with Python 2 and Python 3. I'm using very C-style parsing techniques for what I thought would be maximum efficiency, including manually malloc-ing and free-ing char*
instead of using bytes
so that I can avoid the GIL. When compiled using
python 3.5.2 0 anaconda
cython 0.24.1 py35_0 anaconda
I see speeds that I'm happy with, a moderate boost on small files (~20% faster) and a huge boost on large files (~2.5x faster). When compiled against
python 2.7.12 0 anaconda
cython 0.24.1 py27_0 anaconda
It runs about 125x slower (~17ms in Python 3 vs ~2.2s in Python 2). It's the exact same code compiled in different environments using a pretty simple setuputils
script. I'm not currently using NumPy from Cython for any of the parsing or data storage.
import cython
cimport cython
from cython cimport array
import array
from libc.stdlib cimport strtod, malloc, free
from libc.string cimport memcpy
ctypedef long long int64_t # Really VS2008? Couldn't include this by default?
# Bunch of definitions and utility functions omitted
@cython.boundscheck(False)
cpdef Touchstone parse_touchstone(bytes file_contents, int num_ports):
cdef:
char c
char* buffer = <char*> file_contents
int64_t length_of_buffer = len(file_contents)
int64_t i = 0
# These are some cpdef enums
FreqUnits freq_units
Domain domain
Format fmt
double z0
bint option_line_found = 0
array.array data = array.array('d')
array.array row = array.array('d', [0 for _ in range(row_size)])
while i < length_of_buffer:
c = buffer[i] # cdef char c
if is_whitespace(c):
i += 1
continue
if is_comment_char(c):
# Returns the last index of the comment
i = parse_comment(buffer, length_of_buffer)
continue
if not option_line_found and is_option_leader_char(c):
# Returns the last index of the option line
# assigns values of all references passed in
i = parse_option_line(
buffer, length_of_buffer, i,
&domain, &fmt, &z0, &freq_units)
if i < 0:
# Lots of boring code along the lines of
# if i == some_int:
# raise Exception("message")
# I did this so that only my top-level parse has to interact
# with the interpreter, all the lower level functions have nogil
option_line_found = 1
if option_line_found:
if is_digit(c):
# Parse a float
row[row_idx] = strtod(buffer + i, &end_of_value)
# Jump the cursor to the end of that float
i = end_of_value - p - 1
row_idx += 1
if row_idx == row_size:
# append this row onto the main data array
data.extend(row)
row_idx = 0
i += 1
return Touchstone(num_ports, domain, fmt, z0, freq_units, data)
I've ruled out a few things, such as type casts. I also tested where the code simply loops over the entire file doing nothing. Either Cython optimized that away or it's just really fast because it causes parse_touchstone
to not even show up in a cProfile/pstats
report. I determined that it's not just the comment, whitespace, and option line parsing (not shown is the significantly more complicated keyword-value parsing) after I threw in a print statement in the last if row_idx == row_size
block to print out a status and discovered that it's taking about 0.5-1 second (guesstimate) to parse a row with 512 floating point numbers on it. That really should not take so long, especially when using strtod
to do the parsing. I also checked parsing just 2 rows' worth of values then jumping out of the while loop and it told me that parsing the comments, whitespace, and option line took up about 800ms (1/3 of the overall time), and that was for 6 lines of text totaling less than 150 bytes.
Am I just missing something here? Is there a small trick that would cause Cython code to run 3 orders of magnitude slower in Python 2 than Python 3?
(Note: I haven't shown the full code here because I'm not sure if I'm allowed to for legal reasons and because it's about 450 lines total)
Upvotes: 2
Views: 295
Reputation: 54068
The problem is with strtod
, which is not optimized in VS2008. Apparently it internally calculates the length of the input string each time its called, and if you call it with a long string this will slow down your code considerably. To circumvent this you have to write a wrapper around strtod
to use only small buffers at a time (see the above link for one example of how to do this) or write your own strtod
function.
Upvotes: 1