Reputation: 378
I have a python dictionary with keys being strings and values being lists of strings
myDict = {'key1': ['value1', 'value2'],
'key2': ['value3', 'value4']}
I would like to use this dictionary inside a numba jitted function. In the first step I try to create a Numba typed dictionary and use it within the function based on the codes and docs from the web
from numba import njit
from numba.core import types
from numba.typed import Dict
d = Dict.empty(
key_type=types.unicode_type,
value_type=types.ListType(types.unicode_type),
)
# The typed-dict can be used from the interpreter.
d['key1'] = np.asarray(['value1', 'value2'], dtype='U')
d['key2'] = np.asarray(['value3', 'value4'], dtype='U')
but this already results in a wonderful error
No implementation of function Function(<built-in function setitem>) found for signature:
>>> setitem(DictType[unicode_type,ListType[unicode_type]]<iv=None>, unicode_type, array([unichr x 1], 1d, C))
Also re-creating the dict by hand isn't an option, as the dict is pretty large. So I would need to transfer python dictionary to numba dictionary
I tried to reuse the solution provided here: https://numba.discourse.group/t/how-to-convert-a-non-numba-dictionary-to-a-nb-typed-dict/865 But here again - I don't provide the correct value_type as in my toy example from above
Upvotes: 2
Views: 984
Reputation: 50308
I have a python dictionary with keys being strings and values being lists of strings
Note that strings are barely supported by Numba yet. There is a minimal support but you cannot do anything really useful with strings and string operations are currently very slow (as well as the compilation time).
this already results in a wonderful error
This is because the value type is a types.ListType(types.unicode_type)
while so set the values with np.asarray
. Typed lists and arrays are different incompatible types. You should pick one of the two. I advise you to use Numpy arrays if the size of the strings is not too variable. Note that Numpy strings are stored in a kind of ND array of characters internally so the biggest one determine the shape of the array. The bounded size of the string is fixed and is a part of the type. Meanwhile, list of strings are a list of string reference.
If you choose to use lists, then you need to build and fill a typed list (which are different/incompatible from usual reflected list of CPython).
If you choose Numpy array, then you can use the type types.UnicodeCharSeq(16)[:]
and use dtype='U16'
in np.asarray
. Here, 16 is the maximum number of character. It must be specified as it is part of the type. If you cannot know this value, then you certainly need to use lists of variable size strings (ie. unbounded). This constraint comes from Numpy, but it is generally not much an issue since the bound is found dynamically. However Numba force types to be well defined at compilation time, then it cannot change dynamically and should be predefined before the assignment.
Also re-creating the dict by hand isn't an option, as the dict is pretty large. So I would need to transfer python dictionary to numba dictionary.
Put is shortly, this is not possible. Basic pure-Python dicts and lists are called reflected dict and reflected lists. Reflected lists was previously supported but they are now deprecated and this is not a good idea to use it. Numba moved to typed dict and typed lists because reflected lists caused a lot of issue. Indeed, Numba need items to be statically typed so to be fast. Operating on reference-counted pure-Python GIL-protected objects prevent any possible optimization nor any use of the object in a parallel context resulting in a code as slow as CPython. Pure-python dict/list are inherently inefficient. Typed collections solve this problem at the expense of a mandatory copy.
An alternative solution would to extract the dict items in an object mode context but this is inefficient and the extracted objects need to be typed so to be used in a njit context. Not to mention switching from the two mode is experimental (and pretty unstable yet for non-trivial cases like this).
Upvotes: 2