apitsch
apitsch

Reputation: 1702

Change certain values in 2D numpy array based on values in 1D array without for-loop

My question seems so basic to me that I am a little embarrassed to not have solved it myself. Despite consulting this, this and that, I couldn't figure out, how to change certain values in a 2D numpy array based on values in a 1D numpy array without using a for-loop.

An example case with the desired result would be:

import numpy as np

# sample data:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b = np.array([2, 0, 2])
c = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])

# for-loop solution:
for i in range(len(a)):
    a[i][b[i]] = 0.9 * c[i][b[i]]

# desired result:
print(a)
# [[ 1  2 27]
#  [36  5  6]
#  [ 7  8 81]]

EDIT 1

After implementing a modification of Rafael's answer, I now get the desired result without a for-loop. To my surprise, however, the indexing solution is slower than the for-loop.

import numpy as np
import time

# set seed for reproducibility:
np.random.seed(1)

x = np.random.randint(10, size=(10, 10))
y = np.random.randint(10, size=10)
z = np.random.randint(10, size=(10, 10))

# for-loop solution:
start1 = time.clock()
for i in range(len(x)):
     x[i][y[i]] = 2 * z[i][y[i]]
end1 = time.clock()
print("time loop: " + str(end1 - start1))
# time loop: 0.00045699999999726515
print("result for-loop:")
print(x)
# result for-loop:
# [[ 5  8  9  5  0  0  1  7  6  4]
#  [12  4  5  2  4  2  4  7  7  9]
#  [ 1  7  2  6  9  9  7  6  9  1]
#  [ 2  1  8  8  3  9  8  7  3  6]
#  [ 5  1  9  3  4  8  1 16  0  3]
#  [ 9 14  0  4  9  2  7  7  9  8]
#  [ 6  9  3  7  7  4  5  0  3  6]
#  [ 8  0  2  7  7  9  7  3  0 16]
#  [ 7  7  1  1  3  0  8  6 16  5]
#  [ 6  2  5  7 14  4  4  7  7  4]]

# set seed for reproducibility:
np.random.seed(1)

x = np.random.randint(10, size=(10, 10))
y = np.random.randint(10, size=10)
z = np.random.randint(10, size=(10, 10))

# indexing solution:
start2 = time.clock()
r = x.shape[0]
x[range(r), y] = z[range(r), y] * 2
end2 = time.clock()
print("time indexing: " + str(end2 - start2))
# time indexing: 0.0005479999999948859
print("result indexing:")
print(x)
# result indexing:
# [[ 5  8  9  5  0  0  1  7  6  4]
#  [12  4  5  2  4  2  4  7  7  9]
#  [ 1  7  2  6  9  9  7  6  9  1]
#  [ 2  1  8  8  3  9  8  7  3  6]
#  [ 5  1  9  3  4  8  1 16  0  3]
#  [ 9 14  0  4  9  2  7  7  9  8]
#  [ 6  9  3  7  7  4  5  0  3  6]
#  [ 8  0  2  7  7  9  7  3  0 16]
#  [ 7  7  1  1  3  0  8  6 16  5]
#  [ 6  2  5  7 14  4  4  7  7  4]]

What is causing this? And, how can I realize a speed-up?

Upvotes: 3

Views: 161

Answers (2)

Abhishek Arya
Abhishek Arya

Reputation: 500

Seems like are actually masked arrays are slow. See here.

This is what a user said in another answer.

Keep in mind that MaskedArrays are more of a convenience than a real solution. If you need to perform intensive computations on arrays arrays with missing/undefined values, you're in most cases better off dealing with the mask and the data yourself. Until a better implementation of missing/undefined values is baked in the NumPy code (which should happen some time soon), you are stuck with MaskedArrays. Yes, they are quite slow, because they're coded in pure Python, which of course cannot be as efficient as relying on some C code

Hope it resolves your doubt.

Upvotes: 0

rafaelc
rafaelc

Reputation: 59274

IIUC

r = np.arange(a.shape[0]) # same as range(len(a)) here, but faster.
a[r, b] = c[r, b] * 0.9

array([[ 1,  2, 27],
       [36,  5,  6],
       [ 7,  8, 81]])

Upvotes: 2

Related Questions