Is there an efficient way to fill a numba `Dict` in parallel?

Question

I'm having some trouble quickly filling a numba Dict object with key-value pairs (around 63 million of them). Is there an efficient way to do this in parallel?

The documentation (https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#typed-dict) is clear that numba.typed.Dict is not thread-safe, and so I think to use prange with a single Dict object would be a bad idea. I've tried to use a numba List of Dicts, to populate them in parallel and then stitch them together using update, but I think this last step is also inefficient.

Note that one thing (which may be important) is that all the keys are unique, i.e. once assigned, a key will not be reassigned a value. I think this property makes the problem amenable to an efficient parallelised solution.

Below is an example of the serial approach, which is slow with a large number of key-value pairs.

d = typed.Dict.empty(
    key_type=types.UnicodeCharSeq(128), value_type=types.int64
)

@njit
def fill_dict(keys_list, values_list, d):

    n = len(keys_list)
    for i in range(n):
        d[keys_list[i]] = values_list[i]


fill_dict(keys_list, values_list, d)

Can anybody help me?

Many thanks.

Is there an efficient way to fill a numba `Dict` in parallel?

Answers (1)

Related Questions