Sid
Sid

Reputation: 286

Removing matching elements from two numpy arrays

Consider two sorted numpy arrays:

import numpy as np

a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])

How do I: 1. Find the elements that appear in both lists, and 2. Remove only one instance of that occurrence from each list.

That is the output should be:

a = [1,2,4,8,10,21]

b = [3,3,18,22]

So even if there are duplicates, only one instance is removed. However if the lists are

c = np.array([1,2,4,4,6,8,10,10,10,21])
d = np.array([3,3,4,6,10,10,18,22])

I expect to obtain the new outputs:

c = [1,2,4,8,10,21]

d = [3,3,18,22]

which is the same as above. The difference is the number of 10's in the list. Each of the two 10's in list d takes away one 10 each from c leaving the same result.

This post was the closest match to my question, but it removed all instances of repeats from both lists.

Upvotes: 2

Views: 8143

Answers (6)

Paul Panzer
Paul Panzer

Reputation: 53029

Here is a numpy approach:

import numpy as np

a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])

# join and sort (with Tim sort this should be O(n))
ab = np.concatenate([a,b])
i = ab.argsort(kind="stable")
abo = ab[i]

# mark 1st of each group of equal values
d = np.flatnonzero(np.diff(abo,prepend=abo[0]-1,append=abo[-1]+1))
# mark sorted total by origin (a -> False, b -> True)
ig = i>=len(a)
# compare origins of first and last of each group of equal values
# if they are different mark for deletion
dupl = ig[d[:-1]] ^ ig[d[1:]-1]

# finally, delete
ar = np.delete(a,i[d[:-1][dupl]])
br = np.delete(b,i[d[1:][dupl]-1]-len(a))

# inspect
ar
array([ 1,  2,  4,  8, 10, 21])
br
array([ 3,  3, 18, 22])

Upvotes: 0

A_K
A_K

Reputation: 16

Using for loops:

import numpy as np

a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])

for i, val in enumerate(a):
    if val in b:
        a = np.delete(a, np.where(a == val)[0][0])
        b = np.delete(b, np.where(b == val)[0][0])

for i, val in enumerate(b):
    if val in a:
        a = np.delete(a, np.where(a == val)[0][0])
        b = np.delete(b, np.where(b == val)[0][0])

print(a)
print(b)

Outputs:

[1,2,4,8,10,21]
[3,3,18,22]

Upvotes: 0

RightmireM
RightmireM

Reputation: 2492

I'm not 100% sure what you're looking to do based on the question, but I have been able to duplicate the output using the methods described.

import numpy as np

# List of b that are not in a
a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])
newb = [x for x in b if x not in a]
print(newb)

# REMOVE ONE DUPLICATED ELEMENT FROM LIST
import collections
counter=collections.Counter(a)
print(counter)
newa = list(a)
for k,v in counter.items():
    if v > 1:
        newa.remove(k)
print(newa)

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107287

You can find the indices of first occurences of intersecting items using np.searchsorted as following and then remove them using np.delete() function:

In [58]: intersect = a[np.in1d(a, b)]
In [59]: mask1 = np.searchsorted(a, intersect)

In [60]: mask2 = np.searchsorted(b, intersect)

In [61]: np.delete(a, mask1)
Out[61]: array([ 1,  2,  4,  8, 10, 21])

In [62]: np.delete(b, mask2)
Out[62]: array([ 3,  3, 18, 22])

Upvotes: 2

Dani Mesejo
Dani Mesejo

Reputation: 61910

You can use collections.Counter:

from collections import Counter

import numpy as np

a = np.array([1, 2, 4, 4, 6, 8, 10, 10, 21])
b = np.array([3, 3, 4, 6, 10, 18, 22])

ca = Counter(a)
cb = Counter(b)

result_a = sorted((ca - cb).elements())
result_b = sorted((cb - ca).elements())

print(result_a)
print(result_b)

Output

[1, 2, 4, 8, 10, 21]
[3, 3, 18, 22]

It returns the same result for (as expected):

a = np.array([1, 2, 4, 4, 6, 8, 10, 10, 10, 21])
b = np.array([3, 3, 4, 6, 10, 10, 18, 22])

Upvotes: 3

jfaccioni
jfaccioni

Reputation: 7509

If you don't mind the verbosity:

import numpy as np

a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])

common_values = set(a) & set(b)

a = a.tolist()
b = b.tolist()

for value in common_values:
    a.remove(value)
    b.remove(value)

a = np.array(a)
b = np.array(b)

Upvotes: 0

Related Questions