Reputation: 1614
For my open source project (bquery) I'm running into an issue with cython codes that works perfectly fine in Python 2.7, yet in Python 3.x it throws an error. For the entire code, see: https://github.com/visualfabriq/bquery/pull/66
But to give an idea: the idea of the code is to make a count distinct/unique of values for each element in a grouping. I make a hash check of two values to make sure they are unique (otherwise i would need a hash table per group, which might be more efficient in many cases but not here as with the underlying technology i do not want to run through the values multiple times). To make the values unique I create a concatenated string (with a separator in between) and then check the hash table. So far, so good! Gives a perfect result in Python2 and is reasonably fast. But in Python 3 I run into errors.
This is the code:
cdef
kh_str_t * table
char * element_1
char * element_2
char * element_3
int ret, size_1, size_2, size_3
v = in_buffer[i]
# index
size_1 = len(bytes(current_index)) + 1
element_1 = < char * > malloc(size_1)
strcpy(element_1, bytes(current_index))
# value
size_2 = len(str(v)) + 1
element_2 = < char * > malloc(size_2)
strcpy(element_2, bytes(v))
# combination
size_3 = size_1 + size_2 + 2
element_3 = < char * > malloc(size_3)
strcpy(element_3, element_1 + '|' + element_2)
# hash check
k = kh_get_str(table, element_3)
if k == table.n_buckets:
# first save the new element
k = kh_put_str(table, element_3, & ret)
# then up the amount of values found
out_buffer[current_index] += 1
And this is the error:
======================================================================
ERROR: test_groupby_08: Groupby's type 'count_distinct'
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/carst/venv3/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/carst/PycharmProjects/bquery/bquery/tests/test_ctable.py", line 516, in test_groupby_08
result_bcolz = fact_bcolz.groupby(groupby_cols, agg_list)
File "/home/carst/PycharmProjects/bquery/bquery/ctable.py", line 226, in groupby
bool_arr=bool_arr)
File "/home/carst/PycharmProjects/bquery/bquery/ctable.py", line 161, in aggregate_groups
raise e
File "/home/carst/PycharmProjects/bquery/bquery/ctable.py", line 155, in aggregate_groups
agg_op)
File "bquery/ctable_ext.pyx", line 452, in bquery.ctable_ext.__pyx_fuse_2_0aggregate (bquery/ctable_ext.c:27585)
cpdef aggregate(carray ca_input, carray ca_factor,
File "bquery/ctable_ext.pyx", line 653, in bquery.ctable_ext.aggregate (bquery/ctable_ext.c:27107)
strcpy(element_2, bytes(v))
TypeError: 'float' object is not iterable
I must be overlooking something very obvious, but I do not know what I'm missing. Any guidance or help would be very appreciated!!!
BR
Carst
Upvotes: 0
Views: 443
Reputation: 2679
In Python2.X bytes
is an alias to str
therefore
>>> bytes(42.0)
'42.0'
In Python3.X, however, bytes
has a new constructor which, given anything other than int
or str
treats it as an iterable of ints. Thus the error you're seeing.
>>> help(bytes)
class bytes(object)
| bytes(iterable_of_ints) -> bytes
| bytes(string, encoding[, errors]) -> bytes
| bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
| bytes(int) -> bytes object of size given by the parameter initialized with null bytes
The workaround is to use:
str(v).encode()
Yes, it isn't pretty and requires two copies of data, but it works on both Python 2 and 3.
Upvotes: 1