user3240688
user3240688

Reputation: 1325

numpy - align 2 vectors with potentially missing values

I have 2 numpy matrix with slightly different alignment

X

    id,  value
     1,   0.78
     2,   0.65
     3,   0.77
       ...
       ...
    98,   0.88
    99,   0.77
   100,   0.87

Y

    id,  value
     1,   0.79
     2,   0.65
     3,   0.78
       ...
       ...
    98,   0.89
   100,   0.80

Y is simply missing a particular ID. I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?

Upvotes: 1

Views: 156

Answers (3)

Gulzar
Gulzar

Reputation: 28014

All the values are the same, so the extra element in x will be the difference between the sums.

This solution is o(n), other solutions here are o(n^2)

Data generation:

import numpy as np

# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]]  # exclude 6
print(x)
np.random.shuffle(y)
print(y)

Solution:

Notice np.isclose() used for floating point comparison.

sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))

print(value_index)

Delete relevant index

deleted = np.delete(x, value_index)
print(deleted)

out:

[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346  0.895204
 0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.97969841 0.77368822 0.80105397]

Upvotes: 2

Gonzalo Zabala
Gonzalo Zabala

Reputation: 16

You can try this:

X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]

And there you can do whatever operation you want

Upvotes: 0

Corralien
Corralien

Reputation: 120469

Use in1d:

>>> X
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [ 9.  ,  0.1 ],
       [10.  ,  0.1 ]])

>>> Y
array([[ 1.  ,  0.19],
       [ 2.  ,  0.96],
       [ 3.  ,  0.24],
       [ 4.  ,  0.44],
       [ 5.  ,  0.12],
       [ 6.  ,  0.91],
       [ 7.  ,  0.7 ],
       [ 8.  ,  0.54],
       [10.  ,  0.09]])
>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [10.  ,  0.1 ]])

Upvotes: 0

Related Questions