Carst
Carst

Reputation: 1614

Cython Bytes error in Python3

For my open source project (bquery) I'm running into an issue with cython codes that works perfectly fine in Python 2.7, yet in Python 3.x it throws an error. For the entire code, see: https://github.com/visualfabriq/bquery/pull/66

But to give an idea: the idea of the code is to make a count distinct/unique of values for each element in a grouping. I make a hash check of two values to make sure they are unique (otherwise i would need a hash table per group, which might be more efficient in many cases but not here as with the underlying technology i do not want to run through the values multiple times). To make the values unique I create a concatenated string (with a separator in between) and then check the hash table. So far, so good! Gives a perfect result in Python2 and is reasonably fast. But in Python 3 I run into errors.

This is the code:

cdef

    kh_str_t * table
    char * element_1
    char * element_2
    char * element_3
    int ret, size_1, size_2, size_3

v = in_buffer[i]
# index
size_1 = len(bytes(current_index)) + 1
element_1 = < char * > malloc(size_1)
strcpy(element_1, bytes(current_index))
# value
size_2 = len(str(v)) + 1
element_2 = < char * > malloc(size_2)
strcpy(element_2, bytes(v))
# combination
size_3 = size_1 + size_2 + 2
element_3 = < char * > malloc(size_3)
strcpy(element_3, element_1 + '|' + element_2)
# hash check
k = kh_get_str(table, element_3)
if k == table.n_buckets:
    # first save the new element
    k = kh_put_str(table, element_3, & ret)
    # then up the amount of values found
    out_buffer[current_index] += 1

And this is the error:

======================================================================
ERROR: test_groupby_08: Groupby's type 'count_distinct'
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/carst/venv3/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/carst/PycharmProjects/bquery/bquery/tests/test_ctable.py", line 516, in test_groupby_08
    result_bcolz = fact_bcolz.groupby(groupby_cols, agg_list)
  File "/home/carst/PycharmProjects/bquery/bquery/ctable.py", line 226, in groupby
    bool_arr=bool_arr)
  File "/home/carst/PycharmProjects/bquery/bquery/ctable.py", line 161, in aggregate_groups
    raise e
  File "/home/carst/PycharmProjects/bquery/bquery/ctable.py", line 155, in aggregate_groups
    agg_op)
  File "bquery/ctable_ext.pyx", line 452, in bquery.ctable_ext.__pyx_fuse_2_0aggregate (bquery/ctable_ext.c:27585)
    cpdef aggregate(carray ca_input, carray ca_factor,
  File "bquery/ctable_ext.pyx", line 653, in bquery.ctable_ext.aggregate (bquery/ctable_ext.c:27107)
    strcpy(element_2, bytes(v))
TypeError: 'float' object is not iterable

I must be overlooking something very obvious, but I do not know what I'm missing. Any guidance or help would be very appreciated!!!

BR

Carst

Upvotes: 0

Views: 443

Answers (1)

Sergei Lebedev
Sergei Lebedev

Reputation: 2679

In Python2.X bytes is an alias to str therefore

>>> bytes(42.0)
'42.0'

In Python3.X, however, bytes has a new constructor which, given anything other than int or str treats it as an iterable of ints. Thus the error you're seeing.

>>> help(bytes)
class bytes(object)
 |  bytes(iterable_of_ints) -> bytes
 |  bytes(string, encoding[, errors]) -> bytes
 |  bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
 |  bytes(int) -> bytes object of size given by the parameter initialized with null bytes

The workaround is to use:

str(v).encode()

Yes, it isn't pretty and requires two copies of data, but it works on both Python 2 and 3.

Upvotes: 1

Related Questions