Reputation: 71
I have a list that contains a number of different labels and I would like to change the labels to something different.
Where, original = [1,1,1,2,2,2,3,3,3,3]
And I want to change each values with, modified = [1,3,8]
so the output would look be original_modified = [1,1,1,3,3,3,8,8,8,8]
So far what I have done is this for loop below:
for x, y in zip(np.unique(original), modified):
original_modified = np.where(original == x, original, y)
However, I am not getting the intended results as the output is incorrect, and I'm not quite sure as to why.
I understand I could get a way with a simple for loop with if conditions however, I am not sure if this would be a very dynamic solution.
Any help is appreciated, thanks.
Upvotes: 0
Views: 1299
Reputation: 12229
I found two problems in your code.
In your loop, you should read from original_modified
instead of starting from original
again at each iteration.
You reversed the last two arguments to np.where()
.
This code works:
original_modified = original
for x, y in zip(np.unique(original), modified):
original_modified = np.where(original == x, y, original_modified)
PS: As @PierreD pointed out, np.unique()
might not be the right choice, since it sorts its results. If you need to preserve the order in which elements first appear in original
, use pd.unique()
instead.
Upvotes: 1
Reputation: 26221
Without numpy
:
out = list(map(dict(zip({k:0 for k in original}.keys(), modified)).get, original))
>>> out
[1, 1, 1, 3, 3, 3, 8, 8, 8, 8]
So why does it work?
{k:0 for k in original}
is a way to find the distinct values in original
, in insertion order (unlike set
where order is undefined). It is a dict
where the keys are the distinct values, and the value is always 0.keys()
and zip with the modified
values into a dict. E.g.
>>> dict(zip({k:0 for k in original}.keys(), modified))
{1: 1, 2: 3, 3: 8}
map(_the_mapping_dict_.get, original)
.Here are a few other ways to achieve the same result, and how long they take.
def pure_py(om):
"""Pure Python"""
original, modified = om
return list(map(dict(zip({k: 0 for k in original}.keys(), modified)).get, original))
def py_with_pd_unique(om):
"""Using a dict for replacement, but using pd.unique() to get the unique values"""
original, modified = om
return list(map(dict(zip(pd.unique(original), modified)).get, original))
def np_select(om):
"""Using np.select and assuming inputs are np.array"""
original, modified = om
return np.select([original == v for v in pd.unique(original)], modified, original)
def vect_dict_get(om):
"""Using a vectorized dict.get()"""
original, modified = om
d = dict(zip(pd.unique(original), modified))
return np.apply_along_axis(np.vectorize(d.get), 0, original)
Then:
import perfplot
from math import isqrt
def setup(n):
original = np.random.randint(0, isqrt(n), n)
modified = np.arange(len(pd.unique(original)))
return original, modified
perfplot.show(
setup=setup,
n_range=[4 ** k for k in range(4, 11)],
kernels=[
pure_py,
py_with_pd_unique,
np_select,
vect_dict_get,
],
xlabel='len(original)',
)
Conclusion: py_with_pd_unique
is the fastest through the range. For 1M elements in original
, it is almost twice as fast as the rest:
o, m = setup(1_000_000)
%timeit pure_py((o, m))
# 209 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit py_with_pd_unique((o, m))
# 108 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 3
Reputation: 4273
np.where
will return a boolean index to where the condition is satisfied.
Indexing + assignment should do this:
import numpy as np
original = np.array([1,1,1,2,2,2,3,3,3,3])
out = np.empty(original.shape, dtype=int)
modified = [1, 3, 8]
for x, y in zip(np.unique(original), modified):
out[np.where(original == x)] = y
print(out)
# [1 1 1 3 3 3 8 8 8 8]
Upvotes: 1
Reputation: 13170
Are you really constrained in using np.where
? If not, an alternative solution might be:
import numpy as np
original = np.array([1,1,1,2,2,2,3,3,3,3])
modified = original.copy()
d = {2: 3, 3: 8}
for k, v in d.items():
modified[original == k] = v
print(modified)
# array([1, 1, 1, 3, 3, 3, 8, 8, 8, 8])
Upvotes: 1