jeanie
jeanie

Reputation: 63

Is there an efficient way to fill a numba `Dict` in parallel?

I'm having some trouble quickly filling a numba Dict object with key-value pairs (around 63 million of them). Is there an efficient way to do this in parallel?

The documentation (https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#typed-dict) is clear that numba.typed.Dict is not thread-safe, and so I think to use prange with a single Dict object would be a bad idea. I've tried to use a numba List of Dicts, to populate them in parallel and then stitch them together using update, but I think this last step is also inefficient.

Note that one thing (which may be important) is that all the keys are unique, i.e. once assigned, a key will not be reassigned a value. I think this property makes the problem amenable to an efficient parallelised solution.

Below is an example of the serial approach, which is slow with a large number of key-value pairs.

d = typed.Dict.empty(
    key_type=types.UnicodeCharSeq(128), value_type=types.int64
)

@njit
def fill_dict(keys_list, values_list, d):

    n = len(keys_list)
    for i in range(n):
        d[keys_list[i]] = values_list[i]


fill_dict(keys_list, values_list, d)

Can anybody help me?

Many thanks.

Upvotes: 1

Views: 438

Answers (1)

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11920

You don't have to stitch them together if you preprocess the key into an integer that can be computed for its modulo num_shard value.

# assuming hash() returns an arbitrary integer computed by ascii values
shard = hash(key) % num_shard;
selected_dictionary = dictionary[shard]
value = selected_dictionary[key]

# inserting
# lock only the selected_dictionary
shard = hash(key) % num_shard;
selected_dictionary = dictionary[shard]
selected_dictionary.push((key,value))

The hashing could be something like sum of all ascii codes of chars in key. The modulo based indexing separates blocks of keys so that they can work independently without extra processing except hashing.

Upvotes: 1

Related Questions