Reputation: 1325

numpy - align 2 vectors with potentially missing values

I have 2 numpy matrix with slightly different alignment

    id,  value
     1,   0.78
     2,   0.65
     3,   0.77
       ...
       ...
    98,   0.88
    99,   0.77
   100,   0.87

    id,  value
     1,   0.79
     2,   0.65
     3,   0.78
       ...
       ...
    98,   0.89
   100,   0.80

Y is simply missing a particular ID. I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?

Upvotes: 1

Answers (3)

Gulzar

Reputation: 28014

All the values are the same, so the extra element in x will be the difference between the sums.

This solution is o(n), other solutions here are o(n^2)

Data generation:

import numpy as np

# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]]  # exclude 6
print(x)
np.random.shuffle(y)
print(y)

Solution:

Notice np.isclose() used for floating point comparison.

sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))

print(value_index)

Delete relevant index

deleted = np.delete(x, value_index)
print(deleted)

out:

[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346  0.895204
 0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.97969841 0.77368822 0.80105397]

Upvotes: 2

Gonzalo Zabala

Reputation: 16

You can try this:

X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]

And there you can do whatever operation you want

Upvotes: 0

Corralien

Reputation: 120469

Use in1d:

>>> X
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [ 9.  ,  0.1 ],
       [10.  ,  0.1 ]])

>>> Y
array([[ 1.  ,  0.19],
       [ 2.  ,  0.96],
       [ 3.  ,  0.24],
       [ 4.  ,  0.44],
       [ 5.  ,  0.12],
       [ 6.  ,  0.91],
       [ 7.  ,  0.7 ],
       [ 8.  ,  0.54],
       [10.  ,  0.09]])

>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [10.  ,  0.1 ]])