H.Burns
H.Burns

Reputation: 419

How to compare two numpy arrays and add missing values to the other with a tweak

I have two numpy arrays of different dimension. I want to add those additional elements of the bigger array to the smaller array, only the 0th element and the 1st element should be given as 0.

For example :

a = [ [2,4],[4,5], [8,9],[7,5]]

b = [ [2,5], [4,6]]

After adding the missing elements to b, b would become as follows :

b [ [2,5], [4,6], [8,0], [7,0] ]

I have tried the logic up to some extent, however some values are getting redundantly added as I am not able to check whether that element has already been added to b or not.

Secondly, I am doing it with the help of an additional array c which is the copy of b and then doing the desired operations to c. If somebody can show me how to do it without the third array c , would be very helpful.

import numpy as np

a = [[2,3],[4,5],[6,8], [9,6]]

b = [[2,3],[4,5]]

a = np.array(a)
b = np.array(b)
c = np.array(b)

for i in range(len(b)):
    for j in range(len(a)):
        if a[j,0] == b[i,0]:
            print "matched "
        else:
            print "not matched"
            c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
print c

Upvotes: 3

Views: 4471

Answers (4)

Ratnajit Mukherjee
Ratnajit Mukherjee

Reputation: 41

Assuming you are working on a single dimensional array:

import numpy as np
a = np.linspace(1, 90, 90)
b = np.array([1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,
             21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,
             40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,
             57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,
             77,78,79,80,81,82,84,85,86,87,88,89,90])

m_num = np.setxor1d(a, b).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num), m_num))

This also works in a 2D space:

t1 = np.reshape(a, (10, 9))
t2 = np.reshape(b, (10, 8))
m_num2 = np.setxor1d(t1, t2).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num2), m_num2))

Upvotes: 0

hpaulj
hpaulj

Reputation: 231510

First we should clear up one misconception. c does not have to be a copy. A new variable assignment is sufficient.

c = b
...
    c= np.insert(c, len(c), [a[j,0], 0], axis = 0)

np.insert is not modifying any of its inputs. Rather it makes a new array. And the c=... just assigns that to c, replacing the original assignment. So the original c assignment just makes writing the iteration easier.

Since you are adding this new [a[j,0],0] at the end, you could use concatenate (the underlying function used by insert and stack(s).

c = np.concatenate((c, [a[j,0],0]), axis=0)

That won't make much of a change in the run time. It's better to find all the a[j] and add them all at once.

In this case you want to add a[2,0] and a[3,0]. Leaving aside, for the moment, the question of how we find [2,3], we can do:

In [595]: a=np.array([[2,3],[4,5],[6,8],[9,6]])
In [596]: b=np.array([[2,3],[4,5]])
In [597]: ind = [2,3]

An assign and fill approach would look like:

In [605]: c = np.zeros_like(a)   # target array  
In [607]: c[0:b.shape[0],:] = b     # fill in the b values
In [608]: c[b.shape[0]:,0] = a[ind,0]    # fill in the selected a column

In [609]: c
Out[609]: 
array([[2, 3],
       [4, 5],
       [6, 0],
       [9, 0]])

A variation would be construct a temporary array with the new a values, and concatenate

In [613]: a1 = np.zeros((len(ind),2),a.dtype) 
In [614]: a1[:,0] = a[ind,0]   
In [616]: np.concatenate((b,a1),axis=0)
Out[616]: 
array([[2, 3],
       [4, 5],
       [6, 0],
       [9, 0]])

I'm using the a1 create and fill approach because I'm too lazy to figure out how to concatenate a[ind,0] with enough 0s to make the same thing. :)

As Divakar shows, np.in1d is a handy way of finding the matches

In [617]: np.in1d(a[:,0],b[:,0])
Out[617]: array([ True,  True, False, False], dtype=bool)

In [618]: np.nonzero(~np.in1d(a[:,0],b[:,0]))
Out[618]: (array([2, 3], dtype=int32),)

In [619]: np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
Out[619]: array([2, 3], dtype=int32)

In [620]: ind=np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]

If you don't care about the order a[ind,0] can also be gotten with np.setdiff1d(a[:,0],b[:,0]) (the values will be sorted).

Upvotes: 1

Divakar
Divakar

Reputation: 221614

You can use np.in1d to look for matching rows from b in a to get a mask and based on the mask choose rows from a or set to zeros. Thus, we would have a vectorized approach as shown below -

np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))

Sample run -

In [47]: a
Out[47]: 
array([[2, 4],
       [4, 5],
       [8, 9],
       [7, 5]])

In [48]: b
Out[48]: 
array([[8, 7],
       [4, 6]])

In [49]: np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Out[49]: 
array([[8, 7],
       [4, 6],
       [2, 0],
       [7, 0]])

Upvotes: 2

hashcode55
hashcode55

Reputation: 5860

#####For explanation#####
#basic set operation to get the missing elements 
c = set([i[0] for i in a]) - set([i[0] for i in b])
#c will just store the missing elements....
#then just append the elements 
for i in c:
    b.append([i, 0])

Output -

[[2, 5], [4, 6], [8, 0], [7, 0]]

Edit -

But as they are numpy arrays you can just do this (and without using c as an intermediate) - just two lines

for i in set(a[:, 0]) - (set(b[:, 0])):
    b = np.append(b, [[i, 0]], axis = 0)

Output -

array([[2, 5],
       [4, 6],
       [8, 0],
       [7, 0]])

Upvotes: 3

Related Questions