Akshat Jain
Akshat Jain

Reputation: 111

How to get unique elements in numpy array with different number of elements in each array?

I wants to get unique in numpy 2D array but the array is like this:

a = np.array([[1,2,3], [2,3], [1]])
np.unique(a)

So, the array have different number of elements and I wanted the flatten array with unique elements like this:

[1,2,3]

But "np.unique" not working as expected.

Upvotes: 4

Views: 3461

Answers (2)

pault
pault

Reputation: 43494

Another way is to flatten the list using itertools.chain and then use np.unique(). This can be faster than np.concatenate() if you have a very large list.

For example, consider the following:

First generate random data:

from itertools import chain
import numpy as np
import pandas as pd

N = 100000
a = np.array(
    [[np.random.randint(0,1000) for _ in range(np.random.randint(0,10))] for _ in range(N)]
)

Timing results:

%%timeit
np.unique(list(chain.from_iterable(a)))
#10 loops, best of 3: 66.7 ms per loop

%%timeit
np.unique(np.concatenate(a))
#10 loops, best of 3: 123 ms per loop

You could also use pandas.unique, which according to the docs:

Significantly faster than numpy.unique. Includes NA values.

%%timeit
pd.unique(np.concatenate(a))
#10 loops, best of 3: 107 ms per loop

%%timeit
pd.unique(list(chain.from_iterable(a)))
#10 loops, best of 3: 57.2 ms per loop

Upvotes: 1

akuiper
akuiper

Reputation: 214927

You have an object type array due to the different lengths of inner lists, np.unique will compare objects (inner lists) against each other instead of the elements; You need to manually flatten the array using np.concatenate in a 1d array and then use np.unique:

np.unique(np.concatenate(a))
# array([1, 2, 3])

Upvotes: 5

Related Questions