arun
arun

Reputation: 138

Compare two numpy arrays by first Column and create a third numpy array by concatenating two arrays

I have two 2d numpy arrays which is used to plot simulation results.

The first column of both arrays a and b contains the time intervals and the second column contains the data to be plotted. The two arrays have different shapes a(500,2) b(600,2). I want to compare these two numpy arrays by first column and create a third array with matches found on the first column of a. If no match is found add 0 to third column.

Is there any numpy trick to do this?

For instance:

a=[[0.002,0.998],  
  [0.004,0.997],   
  [0.006,0.996],   
  [0.008,0.995],   
  [0.010,0.993]]   

b= [[0.002,0.666],  
    [0.004,0.665],  
    [0.0041,0.664], 
    [0.0042,0.664], 
    [0.0043,0.664], 
    [0.0044,0.663], 
    [0.0045,0.663], 
    [0.0005,0.663], 
    [0.006,0.663], 
    [0.0061,0.662],
    [0.008,0.661]] 

expected output

c= [[0.002,0.998,0.666],       
    [0.004,0.997,0.665],           
    [0.006,0.996,0.663],           
    [0.008,0.995,0.661],
    [0.010,0.993, 0   ]]  

Upvotes: 0

Views: 2593

Answers (3)

Iosif Serafeimidis
Iosif Serafeimidis

Reputation: 102

The following works both for numpy arrays and simple python lists.

c = [[*x, y[1]] for x in a for y in b if x[0] == y[0]]
d = [[*x, 0] for x in a if x[0] not in [y[0] for y in b]]
c.extend(d)

Someone braver than I am could try to make this one line.

Upvotes: 0

Ahmed Fasih
Ahmed Fasih

Reputation: 6927

import numpy as np
i = np.intersect1d(a[:,0], b[:,0])
overlap = np.vstack([i, a[np.in1d(a[:,0], i), 1], b[np.in1d(b[:,0], i), 1]]).T
underlap = np.setdiff1d(a[:,0], b[:,0])
underlap = np.vstack([underlap, a[np.in1d(a[:,0], underlap), 1], underlap*0]).T
fast_c = np.vstack([overlap, underlap])

This works by taking the intersection of the first column of a and b using intersect1d, and then using in1d to cross-reference that intersection with the second columns.

vstack stacks the elements of the input vertically, and the transpose is needed to get the right dimensions (very fast operation).

Then find times in a that are not in b using setdiff1d, and complete the result by putting 0s in the third column.

This prints out

array([[ 0.002,  0.998,  0.666],
       [ 0.004,  0.997,  0.665],
       [ 0.006,  0.996,  0.   ],
       [ 0.008,  0.995,  0.   ],
       [ 0.01 ,  0.993,  0.   ]])

Upvotes: 2

Anoop
Anoop

Reputation: 5720

I can quickly think of the solution as

import numpy as np

a = np.array([[0.002, 0.998],
     [0.004, 0.997],
     [0.006, 0.996],
     [0.008, 0.995],
     [0.010, 0.993]])

b = np.array([[0.002, 0.666],
     [0.004, 0.665],
     [0.0041, 0.664],
     [0.0042, 0.664],
     [0.0043, 0.664],
     [0.0044, 0.663],
     [0.0045, 0.663],
     [0.0005, 0.663],
     [0.0006, 0.663],
     [0.00061, 0.662],
     [0.0008, 0.661]])


c = []
for row in a:
    index = np.where(b[:,0] == row[0])[0]
    if np.size(index) != 0:
      c.append([row[0], row[1], b[index[0], 1]])
    else:
      c.append([row[0], row[1], 0])

print c

As pointed out in the comments above, there seems to be a data entry error

Upvotes: 2

Related Questions