dhokas
dhokas

Reputation: 1789

efficiently fill a tensor in numpy

I have a numpy array X of size (n, m) and of type np.uint8 (so it only contains values in [0, 255]). I also have a mapping f from [0, 255] to [0, 3].

I want to create an array Y of shape (4, n, m) such that y_{k, i, j} = 1 if k == f(x_{i, j}) and 0 otherwise. For now, I do it like this:

Y = np.zeros((4, n, m))
for i in range(256):
  Y[f(i), X == i] = 1

But this is super slow, and I'm not able to find a more efficient way to do it. Any ideas?

Upvotes: 1

Views: 912

Answers (2)

hpaulj
hpaulj

Reputation: 231385

The mix of scalar indexing and boolean appears to be hurting your speed:

In [706]: %%timeit
     ...: Y=np.zeros((4,3,4))
     ...: for i in range(256):
     ...:   Y[f(i), X==i]+=1
     ...: 

100 loops, best of 3: 12.5 ms per loop

In [722]: %%timeit
     ...: Y=np.zeros((4,3,4))
     ...: for i in range(256):
     ...:     I,J=np.where(X==i)
     ...:     Y[f(i),I,J] = 1
     ...: 
100 loops, best of 3: 8.55 ms per loop

This is for

X=np.arange(12,dtype=np.uint8).reshape(3,4)
def f(i):
    return i%4

In this case, the f(i) is not a major time consumer:

In [718]: timeit K=[f(i) for i in range(256)]
10000 loops, best of 3: 120 µs per loop

but getting the X==i indexes is slow

In [720]: timeit K=[X==i for i in range(256)]
1000 loops, best of 3: 1.29 ms per loop
In [721]: timeit K=[np.where(X==i) for i in range(256)]
100 loops, best of 3: 2.73 ms per loop

We need to rethink the X==i part of the mapping, rather than the f(i) part.

=====================

Flattening the last 2 dimensions helps;

In [780]: %%timeit 
     ...: X1=X.ravel()
     ...: Y=np.zeros((4,12))
     ...: for i in range(256):
     ...:     Y[f(i),X1==i]=1
     ...: Y.shape=(4,3,4)
     ...: 
100 loops, best of 3: 3.16 ms per loop

Upvotes: 1

Divakar
Divakar

Reputation: 221534

Assuming f could operate on all iterating values in one-go, you can use broadcasting -

Yout = (f(X) == np.arange(4)[:,None,None]).astype(int)

Runtime test and verification -

In [35]: def original_app(X,n,m):
    ...:     Y = np.zeros((4, n, m))
    ...:     for i in range(256):
    ...:         Y[f(i), X == i] = 1
    ...:     return Y
    ...: 

In [36]: # Setup Inputs
    ...: n,m = 2000,2000
    ...: X = np.random.randint(0,255,(n,m)).astype('uint8')
    ...: v = np.random.randint(4, size=(256,))
    ...: def f(x): 
    ...:     return v[x]
    ...: 

In [37]: Y = original_app(X,n,m)
    ...: Yout = (f(X) == np.arange(4)[:,None,None]).astype(int)
    ...: 

In [38]: np.allclose(Yout,Y)  # Verify
Out[38]: True

In [39]: %timeit original_app(X,n,m)
1 loops, best of 3: 3.77 s per loop

In [40]: %timeit (f(X) == np.arange(4)[:,None,None]).astype(int)
10 loops, best of 3: 74.5 ms per loop

Upvotes: 1

Related Questions