Reputation: 1325
I have 2 numpy matrix with slightly different alignment
X
id, value
1, 0.78
2, 0.65
3, 0.77
...
...
98, 0.88
99, 0.77
100, 0.87
Y
id, value
1, 0.79
2, 0.65
3, 0.78
...
...
98, 0.89
100, 0.80
Y is simply missing a particular ID. I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?
Upvotes: 1
Views: 156
Reputation: 28014
All the values are the same, so the extra element in x
will be the difference between the sums.
This solution is o(n)
, other solutions here are o(n^2)
import numpy as np
# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]] # exclude 6
print(x)
np.random.shuffle(y)
print(y)
Notice np.isclose()
used for floating point comparison.
sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))
print(value_index)
deleted = np.delete(x, value_index)
print(deleted)
[0.36373441 0.5030346 0.895204 0.03352821 0.20693263 0.28651572
0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346 0.895204
0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346 0.895204 0.03352821 0.20693263 0.28651572
0.97969841 0.77368822 0.80105397]
Upvotes: 2
Reputation: 16
You can try this:
X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]
And there you can do whatever operation you want
Upvotes: 0
Reputation: 120469
Use in1d
:
>>> X
array([[ 1. , 0.53],
[ 2. , 0.72],
[ 3. , 0.44],
[ 4. , 0.35],
[ 5. , 0.32],
[ 6. , 0.14],
[ 7. , 0.52],
[ 8. , 0.4 ],
[ 9. , 0.1 ],
[10. , 0.1 ]])
>>> Y
array([[ 1. , 0.19],
[ 2. , 0.96],
[ 3. , 0.24],
[ 4. , 0.44],
[ 5. , 0.12],
[ 6. , 0.91],
[ 7. , 0.7 ],
[ 8. , 0.54],
[10. , 0.09]])
>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1. , 0.53],
[ 2. , 0.72],
[ 3. , 0.44],
[ 4. , 0.35],
[ 5. , 0.32],
[ 6. , 0.14],
[ 7. , 0.52],
[ 8. , 0.4 ],
[10. , 0.1 ]])
Upvotes: 0