bheklilr
bheklilr

Reputation: 54068

Cython code runs 125x slower when compiled against python 2 vs python 3

I have a big block of Cython code that is parsing Touchstone files that I want to work with Python 2 and Python 3. I'm using very C-style parsing techniques for what I thought would be maximum efficiency, including manually malloc-ing and free-ing char* instead of using bytes so that I can avoid the GIL. When compiled using

python        3.5.2             0    anaconda
cython        0.24.1       py35_0    anaconda

I see speeds that I'm happy with, a moderate boost on small files (~20% faster) and a huge boost on large files (~2.5x faster). When compiled against

python        2.7.12            0    anaconda
cython        0.24.1       py27_0    anaconda

It runs about 125x slower (~17ms in Python 3 vs ~2.2s in Python 2). It's the exact same code compiled in different environments using a pretty simple setuputils script. I'm not currently using NumPy from Cython for any of the parsing or data storage.

import cython
cimport cython

from cython cimport array
import array

from libc.stdlib cimport strtod, malloc, free
from libc.string cimport memcpy

ctypedef long long int64_t  # Really VS2008? Couldn't include this by default?

# Bunch of definitions and utility functions omitted

@cython.boundscheck(False)
cpdef Touchstone parse_touchstone(bytes file_contents, int num_ports):
    cdef:
        char c
        char* buffer = <char*> file_contents
        int64_t length_of_buffer = len(file_contents)
        int64_t i = 0

        # These are some cpdef enums
        FreqUnits freq_units
        Domain domain
        Format fmt
        double z0
        bint option_line_found = 0

        array.array data = array.array('d')
        array.array row = array.array('d', [0 for _ in range(row_size)])

    while i < length_of_buffer:
        c = buffer[i]  # cdef char c
        if is_whitespace(c):
            i += 1
            continue

        if is_comment_char(c):
            # Returns the last index of the comment
            i = parse_comment(buffer, length_of_buffer)
            continue

        if not option_line_found and is_option_leader_char(c):
            # Returns the last index of the option line
            # assigns values of all references passed in
            i = parse_option_line(
                buffer, length_of_buffer, i,
                &domain, &fmt, &z0, &freq_units)
            if i < 0:
                # Lots of boring code along the lines of
                # if i == some_int:
                #     raise Exception("message")
                # I did this so that only my top-level parse has to interact
                # with the interpreter, all the lower level functions have nogil
            option_line_found = 1

        if option_line_found:
            if is_digit(c):
                # Parse a float
                row[row_idx] = strtod(buffer + i, &end_of_value)
                # Jump the cursor to the end of that float
                i = end_of_value - p - 1
                row_idx += 1
                if row_idx == row_size:
                    # append this row onto the main data array
                    data.extend(row)
                    row_idx = 0

        i += 1

    return Touchstone(num_ports, domain, fmt, z0, freq_units, data)

I've ruled out a few things, such as type casts. I also tested where the code simply loops over the entire file doing nothing. Either Cython optimized that away or it's just really fast because it causes parse_touchstone to not even show up in a cProfile/pstats report. I determined that it's not just the comment, whitespace, and option line parsing (not shown is the significantly more complicated keyword-value parsing) after I threw in a print statement in the last if row_idx == row_size block to print out a status and discovered that it's taking about 0.5-1 second (guesstimate) to parse a row with 512 floating point numbers on it. That really should not take so long, especially when using strtod to do the parsing. I also checked parsing just 2 rows' worth of values then jumping out of the while loop and it told me that parsing the comments, whitespace, and option line took up about 800ms (1/3 of the overall time), and that was for 6 lines of text totaling less than 150 bytes.

Am I just missing something here? Is there a small trick that would cause Cython code to run 3 orders of magnitude slower in Python 2 than Python 3?

(Note: I haven't shown the full code here because I'm not sure if I'm allowed to for legal reasons and because it's about 450 lines total)

Upvotes: 2

Views: 295

Answers (1)

bheklilr
bheklilr

Reputation: 54068

The problem is with strtod, which is not optimized in VS2008. Apparently it internally calculates the length of the input string each time its called, and if you call it with a long string this will slow down your code considerably. To circumvent this you have to write a wrapper around strtod to use only small buffers at a time (see the above link for one example of how to do this) or write your own strtod function.

Upvotes: 1

Related Questions