user308827
user308827

Reputation: 21981

Replace values in NumPy array based on dictionary and avoid overlap between new values and keys

I want to replace values in a 2D numpy array based on following dictionary in python:

code    region
334     0
4       22
8       31
12      16
16      17
24      27
28      18
32      21
36       1

I want to find cells in numpy 2D array which match code and replace by corresponding value in region column. The issue is that this will result in replacing code = 12 by region = 16 and in the next line, all cells with value of 16 (including the ones which just got assigned a value of 16) will be replaced by a value of 17. How do I prevent that?

Upvotes: 8

Views: 5687

Answers (2)

Divakar
Divakar

Reputation: 221614

Here's a vectorized one based on np.searchsorted to trace back the locations for each of those keys in the array and then replacing and please excuse the almost sexist function name here (couldn't help it though) -

def replace_with_dict(ar, dic):
    # Extract out keys and values
    k = np.array(list(dic.keys()))
    v = np.array(list(dic.values()))

    # Get argsort indices
    sidx = k.argsort()
    
    # Drop the magic bomb with searchsorted to get the corresponding
    # places for a in keys (using sorter since a is not necessarily sorted).
    # Then trace it back to original order with indexing into sidx
    # Finally index into values for desired output.
    return v[sidx[np.searchsorted(k,ar,sorter=sidx)]]

Sample run -

In [82]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
    ...: 
    ...: np.random.seed(0)
    ...: a = np.random.choice(dic.keys(), 20)
    ...: 

In [83]: a
Out[83]: 
array([ 28,  16,  32,  32, 334,  32,  28,   4,   8, 334,  12,  36,  36,
        24,  12, 334, 334,  36,  24,  28])

In [84]: replace_with_dict(a, dic)
Out[84]: 
array([18, 17, 21, 21,  0, 21, 18, 22, 31,  0, 16,  1,  1, 27, 16,  0,  0,
        1, 27, 18])

Improvement

A faster one for big arrays would be sort the values and keys arrays and then use searchsorted without sorter, like so -

def replace_with_dict2(ar, dic):
    # Extract out keys and values
    k = np.array(list(dic.keys()))
    v = np.array(list(dic.values()))

    # Get argsort indices
    sidx = k.argsort()
    
    ks = k[sidx]
    vs = v[sidx]
    return vs[np.searchsorted(ks,ar)]

Runtime test -

In [91]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
    ...: 
    ...: np.random.seed(0)
    ...: a = np.random.choice(dic.keys(), 20000)

In [92]: out1 = replace_with_dict(a, dic)
    ...: out2 = replace_with_dict2(a, dic)
    ...: print np.allclose(out1, out2)
True

In [93]: %timeit replace_with_dict(a, dic)
1000 loops, best of 3: 453 µs per loop
    
In [95]: %timeit replace_with_dict2(a, dic)
1000 loops, best of 3: 341 µs per loop

Generic case when all array elements are not in dictionary

If all elements in the input array are not guaranteed to be in the dictionary, we need a bit more work as listed below -

def replace_with_dict2_generic(ar, dic, assume_all_present=True):
    # Extract out keys and values
    k = np.array(list(dic.keys()))
    v = np.array(list(dic.values()))

    # Get argsort indices
    sidx = k.argsort()

    ks = k[sidx]
    vs = v[sidx]
    idx = np.searchsorted(ks,ar)

    if assume_all_present==0:
        idx[idx==len(vs)] = 0
        mask = ks[idx] == ar
        return np.where(mask, vs[idx], ar)
    else:
        return vs[idx]

Sample run -

In [163]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
     ...: 
     ...: np.random.seed(0)
     ...: a = np.random.choice(dic.keys(), (20))
     ...: a[-1] = 400

In [165]: a
Out[165]: 
array([ 28,  16,  32,  32, 334,  32,  28,   4,   8, 334,  12,  36,  36,
        24,  12, 334, 334,  36,  24, 400])

In [166]: replace_with_dict2_generic(a, dic, assume_all_present=False)
Out[166]: 
array([ 18,  17,  21,  21,   0,  21,  18,  22,  31,   0,  16,   1,   1,
        27,  16,   0,   0,   1,  27, 400])

Upvotes: 15

Sebastian Mendez
Sebastian Mendez

Reputation: 2981

The way I'd do this is in two passes: first, get the indexes corresponding to the values you want to replace, and then replace the values.

arr = np.array([1,2,3,1,2,3])
code = np.array([1,2])
region = np.array([2,3])
index_list = []
for val in code:
     index_list.append(np.where(arr == val)[0])
for indexes, replace_val in zip(index_list, region):
    arr[indexes] = replace_val

Upvotes: 0

Related Questions