Reputation: 2415
I have two large 2d numpy arrays with the same number of rows but different number of columns. Let's say arr1 has shape (num_rows1, num_cols1) and arr2 has shape (num_rows1, num_cols2).
I preallocated a numpy array arr12 of size (num_rows1, num_cols1 + num_cols2).
What is the most efficient way to copy arr1 and arr2 into arr12 such that arr1 is concatenated with arr2?
Is using this method of preallocation more efficient than numpy's concatenate method?
Upvotes: 1
Views: 481
Reputation: 221614
We will just benchmark across various datasets and draw conclusions from them.
Using benchit
package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
Benchmarking code :
import numpy as np
import benchit
def numpy_concatenate(a, b):
return np.concatenate((a,b),axis=1)
def numpy_hstack(a, b):
return np.hstack((a,b))
def preallocate(a, b):
m,n = a.shape[1], b.shape[1]
out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
out[:,:m] = a
out[:,m:] = b
return out
funcs = [numpy_concatenate, numpy_hstack, preallocate]
R = np.random.rand
inputs = {n: (R(1000,1000), R(1000,n)) for n in [100, 200, 500, 1000, 200, 5000]}
t = benchit.timings(funcs, inputs, multivar=True, input_name='Col length of b')
t.plot(logy=False, logx=True, savepath='plot_1000rows.png')
Conclusion : They are comparable on timings.
On memory side, np.hstack
should be similar to np.concatenate
. So, we will use one of them.
Let's setup an input dataset with large 2D arrays. We will do some memory benchmarking.
Setup code :
# Filename : memprof_npconcat_preallocate.py
import numpy as np
from memory_profiler import profile
@profile(precision=10)
def numpy_concatenate(a, b):
return np.concatenate((a,b),axis=1)
@profile(precision=10)
def preallocate(a, b):
m,n = a.shape[1], b.shape[1]
out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
out[:,:m] = a
out[:,m:] = b
return out
R = np.random.rand
a,b = R(1000,1000), R(1000,1000)
if __name__ == '__main__':
numpy_concatenate(a, b)
if __name__ == '__main__':
preallocate(a, b)
So, a
is 1000x1000 and same for b
.
Run :
$ python3 -m memory_profiler memprof_npconcat_preallocate.py
Filename: memprof_npconcat_preallocate.py
Line # Mem usage Increment Line Contents
================================================
9 69.3281250000 MiB 69.3281250000 MiB @profile(precision=10)
10 def numpy_concatenate(a, b):
11 84.5546875000 MiB 15.2265625000 MiB return np.concatenate((a,b),axis=1)
Filename: memprof_npconcat_preallocate.py
Line # Mem usage Increment Line Contents
================================================
13 69.3554687500 MiB 69.3554687500 MiB @profile(precision=10)
14 def preallocate(a, b):
15 69.3554687500 MiB 0.0000000000 MiB m,n = a.shape[1], b.shape[1]
16 69.3554687500 MiB 0.0000000000 MiB out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
17 83.6484375000 MiB 14.2929687500 MiB out[:,:m] = a
18 84.4218750000 MiB 0.7734375000 MiB out[:,m:] = b
19 84.4218750000 MiB 0.0000000000 MiB return out
Thus, for preallocate
method, the total mem consumption is 14.2929687500
+ 0.7734375000
, which is slightly lesser than 15.2265625000
.
Changing the sizes for input arrays to 5000x5000 for both a
and b
-
$ python3 -m memory_profiler memprof_npconcat_preallocate.py
Filename: memprof_npconcat_preallocate.py
Line # Mem usage Increment Line Contents
================================================
9 435.4101562500 MiB 435.4101562500 MiB @profile(precision=10)
10 def numpy_concatenate(a, b):
11 816.8515625000 MiB 381.4414062500 MiB return np.concatenate((a,b),axis=1)
Filename: memprof_npconcat_preallocate.py
Line # Mem usage Increment Line Contents
================================================
13 435.5351562500 MiB 435.5351562500 MiB @profile(precision=10)
14 def preallocate(a, b):
15 435.5351562500 MiB 0.0000000000 MiB m,n = a.shape[1], b.shape[1]
16 435.5351562500 MiB 0.0000000000 MiB out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
17 780.3203125000 MiB 344.7851562500 MiB out[:,:m] = a
18 816.9296875000 MiB 36.6093750000 MiB out[:,m:] = b
19 816.9296875000 MiB 0.0000000000 MiB return out
Again, the total from preallocation is lesser.
Conclusion : Preallocation method has slightly better memory benefits, which in a way makes sense. With concatenate, we have three arrays involved src1 + src2 -> dst, whereas with preallocation, there's just src and dst with lesser memory congestion though in two steps.
Upvotes: 1
Reputation: 231475
numpy
compiled code such as concatenate
typically determines how large of a return array it needs, creates that array, and copies values to it. The fact that it does that with C-API calls doesn't make any difference in the memory use. concatenate
does not overwrite or reuse any of the memory used by the arguments.
In [465]: A, B = np.ones((1000,1000)), np.zeros((1000,500))
some time comparisons:
In [466]: timeit np.concatenate((A,B), axis=1)
6.73 ms ± 338 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [467]: C = np.zeros((1000,1500))
In [468]: timeit np.concatenate((A,B), axis=1, out=C)
6.44 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [469]: %%timeit
...: C = np.zeros((1000,1500))
...: np.concatenate((A,B), axis=1, out=C)
11.5 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [470]: %%timeit
...: C = np.zeros((1000,1500))
...: C[:,:1000]=A; C[:,1000:]=B
11.5 ms ± 282 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [471]: %%timeit
...: C[:,:1000]=A; C[:,1000:]=B
6.29 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So if the target array already exists, use it. But there doesn't appear to be much of an advantage to creating one just for the purpose.
Upvotes: 1