Reputation: 111
I wants to get unique in numpy 2D array but the array is like this:
a = np.array([[1,2,3], [2,3], [1]])
np.unique(a)
So, the array have different number of elements and I wanted the flatten array with unique elements like this:
[1,2,3]
But "np.unique" not working as expected.
Upvotes: 4
Views: 3461
Reputation: 43494
Another way is to flatten the list using itertools.chain
and then use np.unique()
. This can be faster than np.concatenate()
if you have a very large list.
For example, consider the following:
First generate random data:
from itertools import chain
import numpy as np
import pandas as pd
N = 100000
a = np.array(
[[np.random.randint(0,1000) for _ in range(np.random.randint(0,10))] for _ in range(N)]
)
Timing results:
%%timeit
np.unique(list(chain.from_iterable(a)))
#10 loops, best of 3: 66.7 ms per loop
%%timeit
np.unique(np.concatenate(a))
#10 loops, best of 3: 123 ms per loop
You could also use pandas.unique
, which according to the docs:
Significantly faster than numpy.unique. Includes NA values.
%%timeit
pd.unique(np.concatenate(a))
#10 loops, best of 3: 107 ms per loop
%%timeit
pd.unique(list(chain.from_iterable(a)))
#10 loops, best of 3: 57.2 ms per loop
Upvotes: 1
Reputation: 214927
You have an object type array due to the different lengths of inner lists, np.unique
will compare objects (inner lists) against each other instead of the elements; You need to manually flatten the array using np.concatenate
in a 1d array and then use np.unique
:
np.unique(np.concatenate(a))
# array([1, 2, 3])
Upvotes: 5