morfys
morfys

Reputation: 2415

how to copy two 2d numpy arrays to a preallocated array

I have two large 2d numpy arrays with the same number of rows but different number of columns. Let's say arr1 has shape (num_rows1, num_cols1) and arr2 has shape (num_rows1, num_cols2).

I preallocated a numpy array arr12 of size (num_rows1, num_cols1 + num_cols2).

What is the most efficient way to copy arr1 and arr2 into arr12 such that arr1 is concatenated with arr2?

Is using this method of preallocation more efficient than numpy's concatenate method?

Upvotes: 1

Views: 481

Answers (2)

Divakar
Divakar

Reputation: 221614

Benchmarking

We will just benchmark across various datasets and draw conclusions from them.

Timings

Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.

Benchmarking code :

import numpy as np
import benchit

def numpy_concatenate(a, b):
    return np.concatenate((a,b),axis=1)

def numpy_hstack(a, b):
    return np.hstack((a,b))

def preallocate(a, b):
    m,n = a.shape[1], b.shape[1]
    out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
    out[:,:m] = a
    out[:,m:] = b
    return out
    
funcs = [numpy_concatenate, numpy_hstack, preallocate]
R = np.random.rand 

inputs = {n: (R(1000,1000), R(1000,n)) for n in [100, 200, 500, 1000, 200, 5000]}
t = benchit.timings(funcs, inputs, multivar=True,   input_name='Col length of b')
t.plot(logy=False, logx=True, savepath='plot_1000rows.png')

enter image description here

Conclusion : They are comparable on timings.

Memory profiling

On memory side, np.hstack should be similar to np.concatenate. So, we will use one of them.

Let's setup an input dataset with large 2D arrays. We will do some memory benchmarking.

Setup code :

# Filename : memprof_npconcat_preallocate.py
import numpy as np
from memory_profiler import profile

@profile(precision=10)
def numpy_concatenate(a, b):
    return np.concatenate((a,b),axis=1)

@profile(precision=10)
def preallocate(a, b):
    m,n = a.shape[1], b.shape[1]
    out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
    out[:,:m] = a
    out[:,m:] = b
    return out

R = np.random.rand
a,b = R(1000,1000), R(1000,1000)

if __name__ == '__main__':
    numpy_concatenate(a, b)

if __name__ == '__main__':
    preallocate(a, b)  

So, a is 1000x1000 and same for b.

Run :

$ python3 -m memory_profiler memprof_npconcat_preallocate.py 
Filename: memprof_npconcat_preallocate.py

Line #    Mem usage    Increment   Line Contents
================================================
     9  69.3281250000 MiB  69.3281250000 MiB   @profile(precision=10)
    10                             def numpy_concatenate(a, b):
    11  84.5546875000 MiB  15.2265625000 MiB       return np.concatenate((a,b),axis=1)


Filename: memprof_npconcat_preallocate.py

Line #    Mem usage    Increment   Line Contents
================================================
    13  69.3554687500 MiB  69.3554687500 MiB   @profile(precision=10)
    14                             def preallocate(a, b):
    15  69.3554687500 MiB   0.0000000000 MiB       m,n = a.shape[1], b.shape[1]
    16  69.3554687500 MiB   0.0000000000 MiB       out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
    17  83.6484375000 MiB  14.2929687500 MiB       out[:,:m] = a
    18  84.4218750000 MiB   0.7734375000 MiB       out[:,m:] = b
    19  84.4218750000 MiB   0.0000000000 MiB       return out

Thus, for preallocate method, the total mem consumption is 14.2929687500 + 0.7734375000, which is slightly lesser than 15.2265625000.

Changing the sizes for input arrays to 5000x5000 for both a and b -

$ python3 -m memory_profiler memprof_npconcat_preallocate.py
Filename: memprof_npconcat_preallocate.py

Line #    Mem usage    Increment   Line Contents
================================================
     9 435.4101562500 MiB 435.4101562500 MiB   @profile(precision=10)
    10                             def numpy_concatenate(a, b):
    11 816.8515625000 MiB 381.4414062500 MiB       return np.concatenate((a,b),axis=1)


Filename: memprof_npconcat_preallocate.py

Line #    Mem usage    Increment   Line Contents
================================================
    13 435.5351562500 MiB 435.5351562500 MiB   @profile(precision=10)
    14                             def preallocate(a, b):
    15 435.5351562500 MiB   0.0000000000 MiB       m,n = a.shape[1], b.shape[1]
    16 435.5351562500 MiB   0.0000000000 MiB       out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))
    17 780.3203125000 MiB 344.7851562500 MiB       out[:,:m] = a
    18 816.9296875000 MiB  36.6093750000 MiB       out[:,m:] = b
    19 816.9296875000 MiB   0.0000000000 MiB       return out

Again, the total from preallocation is lesser.

Conclusion : Preallocation method has slightly better memory benefits, which in a way makes sense. With concatenate, we have three arrays involved src1 + src2 -> dst, whereas with preallocation, there's just src and dst with lesser memory congestion though in two steps.

Upvotes: 1

hpaulj
hpaulj

Reputation: 231475

numpy compiled code such as concatenate typically determines how large of a return array it needs, creates that array, and copies values to it. The fact that it does that with C-API calls doesn't make any difference in the memory use. concatenate does not overwrite or reuse any of the memory used by the arguments.

In [465]: A, B = np.ones((1000,1000)), np.zeros((1000,500))

some time comparisons:

In [466]: timeit np.concatenate((A,B), axis=1)                                                         
6.73 ms ± 338 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [467]: C = np.zeros((1000,1500))                                                                    
In [468]: timeit np.concatenate((A,B), axis=1, out=C)                                                  
6.44 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [469]: %%timeit 
     ...: C = np.zeros((1000,1500)) 
     ...: np.concatenate((A,B), axis=1, out=C)                                                                                               
11.5 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [470]: %%timeit 
     ...: C = np.zeros((1000,1500)) 
     ...: C[:,:1000]=A; C[:,1000:]=B                                                                                             
11.5 ms ± 282 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [471]: %%timeit 
     ...: C[:,:1000]=A; C[:,1000:]=B                                                                                              
6.29 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So if the target array already exists, use it. But there doesn't appear to be much of an advantage to creating one just for the purpose.

Upvotes: 1

Related Questions