Reputation: 319
Lets say that I have the following numpy array
:
[[1,1,1]
[1,1,1]
[1,1,1]]
And I need to pad each element in the array with a zero on either side (rather then numpy.pad()
which pads rows and columns). This ends up as follows:
[ [0,1,0,0,1,0,0,1,0]
[0,1,0,0,1,0,0,1,0]
[0,1,0,0,1,0,0,1,0] ]
Is there a more efficient way to do this then creating an empty array and using nested loops?
Note: My preference is as fast and memory light as possible. Individual arrays can be up to 12000^2 elements and I'm working with 16 of them at the same time so my margins are pretty thin in 32 bit
Edit: Should have specified but the padding is not always 1, padding must be variable as I am up-sampling data by a factor dependent on the array with the highest resolution. given 3 arrays with the shapes (121,121) ; (1200,1200) ; (12010,12010) I need to be able to pad the first two arrays to a shape of (12010,12010) (I know that these numbers don't share common factors, this isn't a problem as being within an index or two of the actual location is acceptable, this padding is just needed to get them into the same shape, rounding out the numbers by padding the rows at the ends is acceptable)
Working Solution: an adjustment of @Kasramvd solution does the trick. Here is the code that fits my application of the problem.
import numpy as np
a = np.array([[1, 2, 3],[1, 2, 3], [1, 2, 3]])
print(a)
x, y = a.shape
factor = 3
indices = np.repeat(np.arange(y + 1), 1*factor*2)[1*factor:-1*factor]
a=np.insert(a, indices, 0, axis=1)
print(a)
results in:
[[1 2 3]
[1 2 3]
[1 2 3]]
[[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]
[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]
[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]]
Upvotes: 3
Views: 2398
Reputation: 231385
A problem with time and memory comparison of these methods is that it treats insert
as a blackbox. But that function is Python code that can be read and replicated. While it can handle various kinds of inputs, in this case I think it
new
target array of the right sizeThere's no way that insert
can be more efficient than Divakar's
padcols
.
Let's see if I can clearly replicate insert
:
In [255]: indices = np.repeat(np.arange(y + 1), 1*factor*2)[1*factor:-1*factor]
In [256]: indices
Out[256]: array([0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3])
In [257]: numnew = len(indices)
In [258]: order = indices.argsort(kind='mergesort')
In [259]: indices[order] += np.arange(numnew)
In [260]: indices
Out[260]:
array([ 0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 18, 19,
20])
these are the columns that will take the 0
fill value.
In [266]: new = np.empty((3,21),a.dtype)
In [267]: new[:,indices] = 0 # fill
# in this case with a lot of fills
# np.zeros((3,21),a.dtype) would be just as good
In [270]: old_mask = np.ones((21,),bool)
In [271]: old_mask[indices] = False
In [272]: new[:,old_mask] = a
In [273]: new
Out[273]:
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0]])
The main difference from padcols
is this uses a boolean mask for indexing as opposed to column numbers.
Upvotes: 0
Reputation: 13510
Flatten the array, convert each 1 to [0, 1, 0], then reshape again to 3 rows. In the following code the ones array is in var a:
a = np.ones([3,3])
b = [[0, x, 0] for x in a.ravel()]
c = np.reshape(b, (a.shape[0], -1))
print(c)
Output:
[[0 1 0 0 1 0 0 1 0]
[0 1 0 0 1 0 0 1 0]
[0 1 0 0 1 0 0 1 0]]
Upvotes: 1
Reputation: 221564
Here's an approach using zeros-initialization
-
def padcols(arr,padlen):
N = 1+2*padlen
m,n = arr.shape
out = np.zeros((m,N*n),dtype=arr.dtype)
out[:,padlen+np.arange(n)*N] = arr
return out
Sample run -
In [118]: arr
Out[118]:
array([[21, 14, 23],
[52, 70, 90],
[40, 57, 11],
[71, 33, 78]])
In [119]: padcols(arr,1)
Out[119]:
array([[ 0, 21, 0, 0, 14, 0, 0, 23, 0],
[ 0, 52, 0, 0, 70, 0, 0, 90, 0],
[ 0, 40, 0, 0, 57, 0, 0, 11, 0],
[ 0, 71, 0, 0, 33, 0, 0, 78, 0]])
In [120]: padcols(arr,2)
Out[120]:
array([[ 0, 0, 21, 0, 0, 0, 0, 14, 0, 0, 0, 0, 23, 0, 0],
[ 0, 0, 52, 0, 0, 0, 0, 70, 0, 0, 0, 0, 90, 0, 0],
[ 0, 0, 40, 0, 0, 0, 0, 57, 0, 0, 0, 0, 11, 0, 0],
[ 0, 0, 71, 0, 0, 0, 0, 33, 0, 0, 0, 0, 78, 0, 0]])
In this section, I am benchmarking on runtime and memory usage the approach posted in this post : padcols
and @Kasramvd's solution func : padder
on a decent sized array for various padding lengths.
Timing profiling
In [151]: arr = np.random.randint(10,99,(300,300))
# Representative of original `3x3` sized array just bigger
In [152]: %timeit padder(arr,1)
100 loops, best of 3: 3.56 ms per loop
In [153]: %timeit padcols(arr,1)
100 loops, best of 3: 2.13 ms per loop
In [154]: %timeit padder(arr,2)
100 loops, best of 3: 5.82 ms per loop
In [155]: %timeit padcols(arr,2)
100 loops, best of 3: 3.66 ms per loop
In [156]: %timeit padder(arr,3)
100 loops, best of 3: 7.83 ms per loop
In [157]: %timeit padcols(arr,3)
100 loops, best of 3: 5.11 ms per loop
Memory profiling
Script used for these memory tests -
import numpy as np
from memory_profiler import profile
arr = np.random.randint(10,99,(300,300))
padlen = 1 # Edited to 1,2,3 for the three cases
n = padlen
@profile(precision=10)
def padder():
x, y = arr.shape
indices = np.repeat(np.arange(y+1), n*2)[n:-n]
return np.insert(arr, indices, 0, axis=1)
@profile(precision=10)
def padcols():
N = 1+2*padlen
m,n = arr.shape
out = np.zeros((m,N*n),dtype=arr.dtype)
out[:,padlen+np.arange(n)*N] = arr
return out
if __name__ == '__main__':
padder()
if __name__ == '__main__':
padcols()
Memory usage output -
Case # 1:
$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
8 42.4492187500 MiB 0.0000000000 MiB @profile(precision=10)
9 def padder():
10 42.4492187500 MiB 0.0000000000 MiB x, y = arr.shape
11 42.4492187500 MiB 0.0000000000 MiB indices = np.repeat(np.arange(y+1), n*2)[n:-n]
12 44.7304687500 MiB 2.2812500000 MiB return np.insert(arr, indices, 0, axis=1)
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
14 42.8750000000 MiB 0.0000000000 MiB @profile(precision=10)
15 def padcols():
16 42.8750000000 MiB 0.0000000000 MiB N = 1+2*padlen
17 42.8750000000 MiB 0.0000000000 MiB m,n = arr.shape
18 42.8750000000 MiB 0.0000000000 MiB out = np.zeros((m,N*n),dtype=arr.dtype)
19 44.6757812500 MiB 1.8007812500 MiB out[:,padlen+np.arange(n)*N] = arr
20 44.6757812500 MiB 0.0000000000 MiB return out
Case # 2:
$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
8 42.3710937500 MiB 0.0000000000 MiB @profile(precision=10)
9 def padder():
10 42.3710937500 MiB 0.0000000000 MiB x, y = arr.shape
11 42.3710937500 MiB 0.0000000000 MiB indices = np.repeat(np.arange(y+1), n*2)[n:-n]
12 46.2421875000 MiB 3.8710937500 MiB return np.insert(arr, indices, 0, axis=1)
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
14 42.8476562500 MiB 0.0000000000 MiB @profile(precision=10)
15 def padcols():
16 42.8476562500 MiB 0.0000000000 MiB N = 1+2*padlen
17 42.8476562500 MiB 0.0000000000 MiB m,n = arr.shape
18 42.8476562500 MiB 0.0000000000 MiB out = np.zeros((m,N*n),dtype=arr.dtype)
19 46.1289062500 MiB 3.2812500000 MiB out[:,padlen+np.arange(n)*N] = arr
20 46.1289062500 MiB 0.0000000000 MiB return out
Case # 3:
$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
8 42.3906250000 MiB 0.0000000000 MiB @profile(precision=10)
9 def padder():
10 42.3906250000 MiB 0.0000000000 MiB x, y = arr.shape
11 42.3906250000 MiB 0.0000000000 MiB indices = np.repeat(np.arange(y+1), n*2)[n:-n]
12 47.4765625000 MiB 5.0859375000 MiB return np.insert(arr, indices, 0, axis=1)
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
14 42.8945312500 MiB 0.0000000000 MiB @profile(precision=10)
15 def padcols():
16 42.8945312500 MiB 0.0000000000 MiB N = 1+2*padlen
17 42.8945312500 MiB 0.0000000000 MiB m,n = arr.shape
18 42.8945312500 MiB 0.0000000000 MiB out = np.zeros((m,N*n),dtype=arr.dtype)
19 47.4648437500 MiB 4.5703125000 MiB out[:,padlen+np.arange(n)*N] = arr
20 47.4648437500 MiB 0.0000000000 MiB return out
Upvotes: 7
Reputation: 107287
You can create the related indices with np.repeat
based on array's shape, then insert the 0 in that indices.
>>> def padder(arr, n):
... x, y = arr.shape
... indices = np.repeat(np.arange(y+1), n*2)[n:-n]
... return np.insert(arr, indices, 0, axis=1)
...
>>>
>>> padder(a, 1)
array([[0, 1, 0, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0, 0, 1, 0]])
>>>
>>> padder(a, 2)
array([[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]])
>>> padder(a, 3)
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]])
Aforementioned approach in one line:
np.insert(a, np.repeat(np.arange(a.shape[1] + 1), n*2)[n:-n], 0, axis=1)
Upvotes: 2