Reputation: 148
So I have a source array like this:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 100]
[ 0 100 33 100]
[ 3 110 22 100]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 100]]
and I want to update the array with this one, depend on the first column
[[ 3 110 22 105]
[ 5 105 17 110]
[ 1 95 28 115]]
to be like this
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
but I can't find a function in NumPy can do this directly, so currently have no way to do that better than this method I wrote:
def update_ary_with_ary(source, updates):
for x in updates:
index_of_col = np.argwhere(source[:,0] == x[0])
source[index_of_col] = x
This function makes a loop so it's not professional and not have high performance so I will use this until some-one give me a better way with NumPy laps, I don't want a solution from another lap, just Numpy
Upvotes: 0
Views: 153
Reputation: 3722
Assuming your source array is s
and update array is u
, and assuming that s
and u
are not huge, you can do:
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
Testing:
import numpy as np
s = np.array(
[[ 9, 85, 32, 100],
[ 7, 80, 30, 100],
[ 2, 90, 16, 100],
[ 6, 120, 22, 100],
[ 5, 105, 17, 100],
[ 0, 100, 33, 100],
[ 3, 110, 22, 100],
[ 4, 80, 22, 100],
[ 8, 115, 19, 100],
[ 1, 95, 28, 100]])
u = np.array(
[[ 3, 110, 22, 105],
[ 5, 105, 17, 110],
[ 1, 95, 28, 115]])
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
print(s)
This prints:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
Edit: OP has provided the following additional details:
Based on this additional detail, the following alternative solution might provide a better performance, especially if the source array does not have its rows sorted on the first column:
sorted_idx = np.argsort(s[:,0])
pos = np.searchsorted(s[:,0],u[:,0],sorter=sorted_idx)
update_row_ids = sorted_idx[pos]
s[update_row_ids] = u
Upvotes: 1
Reputation: 148
fountainhead your answer works correctly and yes it's full used Numpy laps, but in the performance test, it's rise the time on processing 50K rows in my simulation program in double!! from 22 seconds to 44 seconds!! I don't know why!! but your answer helps me to get the right answer on only this line:
source[updates[:,0]] = updates
# or
s[u[:,0]] = u
so when I use this its lower processing time from for 100K rows to only 0.5 seconds and then let me process more like 1M rows for only 5 seconds, am already learning python and data mining am shocked from these numbers, it's never happing before on other languages I play on the huge array like regular variables. you can see that on my GitHub.
https://github.com/qahmad81/war_simulation
fountainhead you should take the answer but visited should know the best answer to use.
Upvotes: 0