dts
dts

Reputation: 125

Why does using indirectly defined converters in numpy.genfromtxt() fail with error "RecursionError: maximum recursion depth exceeded"?

I'm guessing that this is a question only for folks who enjoy the challenge of digging into the source code of Python Tracebacks ... but maybe someone knows the answer off the top of their head.

This should be easily reproducible, see the code below (I suppose that depending on your hardware and value for sys.setrecursionlimit(), you might need to increase the max iterations from my value of 2000).

It's numpy.genfromtxt reading a CSV file with 1 column and 1 row that consists of the single character 0. When "converters" is explicitly set (commented out below), all is well and fast. When "converters" is set indirectly as shown in the code, Python is doing something recursive that is totally unnecessary, and the code fails somewhere between 1400 and 1500 iterations (on my computer) with error "RecursionError: maximum recursion depth exceeded". Before the code fails, it gets slower and slower as iterations (and presumably, recursion depth) increase. The Traceback points to the source code involved, but I don't know how to dig into it.

The question is: Why doesn't this code work exactly as the code where "converters" is set explicitly? Is it a bug, or does it make sense; i.e., my code is bad?

#Spyder 3.3.3 | Python 3.7.3 64-bit | Qt 5.9.6 | PyQt5 5.9.2 | Windows 10 

import numpy as np

the_converters = {'data': lambda s : 0} 

jcount = 0
while jcount < 2000:

    jcount = jcount + 1
    print(jcount)

    the_array = np.genfromtxt('recursion_debug.csv', delimiter =',', \
                             names = 'data', \
                             converters = the_converters, \
                             #converters = {'data': lambda s : 0}, \
                             )

Upvotes: 1

Views: 108

Answers (1)

hpaulj
hpaulj

Reputation: 231325

In [1]: txt="""0,0 
   ...: 0,0"""         
In [14]: cvt = {'data':lambda s: 10}                                                                                     
In [15]: cvt                                                                                                             
Out[15]: {'data': <function __main__.<lambda>(s)>}
In [16]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0],names='data',converters=cvt)                           
Out[16]: array([10, 10])
In [17]: cvt                                                                                                             
Out[17]: 
{'data': <function __main__.<lambda>(s)>,
 0: functools.partial(<function genfromtxt.<locals>.tobytes_first at 0x7f5e71154bf8>, conv=<function <lambda> at 0x7f5e70928b70>)}

genfromtxt is modifying the cvt object (in-place), and this effect is cumulative:

In [18]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0],names='data',converters=cvt)                           
Out[18]: array([10, 10])
In [19]: cvt                                                                                                             
Out[19]: 
{'data': <function __main__.<lambda>(s)>,
 0: functools.partial(<function genfromtxt.<locals>.tobytes_first at 0x7f5e82ea4bf8>, conv=functools.partial(<function genfromtxt.<locals>.tobytes_first at 0x7f5e71154bf8>, conv=<function <lambda> at 0x7f5e70928b70>))}

Note that the named key value does not change; rather it adds a column number key with the modified converter.

If instead we create the dictionary in-line, and just provide the lambda (or function), the function is not modified:

In [26]: cvt = lambda s: 10                                                                                              
In [27]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0],names='data',converters={'data':cvt})                  
Out[27]: array([10, 10])
In [28]: cvt                                                                                                             
Out[28]: <function __main__.<lambda>(s)>

Now make a function that shows the input string as well:

In [53]: def foo(s): 
    ...:     print(s) 
    ...:     return '10' 
    ...:                                                                                                                 
In [54]: cvt = {'data':foo}                                                                                              

If I specify the encoding, the dictionary is still modified (new key), but the function isn't modified:

In [55]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0],names='data',converters=cvt, encoding=None)            
0
0
0
Out[55]: array(['10', '10'], dtype='<U2')
In [56]: cvt                                                                                                             
Out[56]: {'data': <function __main__.foo(s)>, 0: <function __main__.foo(s)>}

Without encoding (or the default 'bytes'), the tobytes wrapper is added, and a bytestring is passed to my function:

In [57]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0],names='data',converters=cvt)                           
b'0'
b'0'
b'0'
b'0'
Out[57]: array(['10', '10'], dtype='<U2')
In [58]: cvt                                                                                                             
Out[58]: 
{'data': <function __main__.foo(s)>,
 0: functools.partial(<function genfromtxt.<locals>.tobytes_first at 0x7f5e82e9c730>, conv=<function foo at 0x7f5e7113e268>)}

===

The code that added the functools.partial is part of the old Py2 to Py3 bytes to unicode switch:

   elif byte_converters:
        # converters may use decode to workaround numpy's old behaviour,
        # so encode the string again before passing to the user converter
        def tobytes_first(x, conv):
            if type(x) is bytes:
                return conv(x)
            return conv(x.encode("latin1"))
        import functools
        user_conv = functools.partial(tobytes_first, conv=conv)
    else:
        user_conv = conv

Upvotes: 1

Related Questions