jmatthieu
jmatthieu

Reputation: 113

Remove consecutive duplicates in a NumPy array

I would like to remove duplicates which follow each other, but not duplicates along the whole array. Also, I want to keep the ordering unchanged.

So if the input is [0 0 1 3 2 2 3 3] the output should be [0 1 3 2 3]

I found a way using itertools.groupby() but I am looking for a faster NumPy solution.

Upvotes: 10

Views: 3660

Answers (3)

slouis
slouis

Reputation: 43

For NumPy version >= 1.16.0 you can use the prepend argument:

a[np.diff(a, prepend=np.nan).astype(bool)]

Upvotes: 3

MaThMaX
MaThMaX

Reputation: 2015

a[np.insert(np.diff(a).astype(np.bool), 0, True)]
Out[99]: array([0, 1, 3, 2, 3])

The general idea is to use diff to find the difference between two consecutive elements in the array. Then we only index those which give non-zero differences elements. But since the length of diff is shorter by 1. So before indexing, we need to insert the True to the beginning of the diff array.

Explanation:

In [100]: a
Out[100]: array([0, 0, 1, 3, 2, 2, 3, 3])

In [101]: diff = np.diff(a).astype(np.bool)

In [102]: diff
Out[102]: array([False,  True,  True,  True, False,  True, False], dtype=bool)

In [103]: idx = np.insert(diff, 0, True)

In [104]: idx
Out[104]: array([ True, False,  True,  True,  True, False,  True, False], dtype=bool)

In [105]: a[idx]
Out[105]: array([0, 1, 3, 2, 3])

Upvotes: 18

Simon Kirsten
Simon Kirsten

Reputation: 2577

For pure python wich also works with numpy arrays use this:

def modify(l):
    last = None
    for e in l:
        if e != last:
            yield e

        last = e

pure = modify([0, 0, 1, 3, 2, 2, 3, 3])

import numpy
num = numpy.array(modify(numpy.array([0, 0, 1, 3, 2, 2, 3, 3])))

I don't know if there are any numpy functions wich would speed this up.

Upvotes: 1

Related Questions