Excalibur
Excalibur

Reputation: 441

Concatenate two big numpy 2D arrays

I have two big numpy 2D arrays. One shape is X1 (1877055, 1299), another is X2 (1877055, 1445). I then use

X = np.hstack((X1, X2))

to concatenate the two arrays into a bigger array. However, the program doesn't run and exit with code -9. It didn't show any error message.

What is the problem? How can I concatenate such two big numpy 2D arrays?

Upvotes: 3

Views: 8105

Answers (2)

abarnert
abarnert

Reputation: 365767

Unless there's something wrong with your NumPy build or your OS (both of which are unlikely), this is almost certainly a memory error.

For example, let's say all these values are float64. So, you've already allocated at least 18GB and 20GB for these two arrays, and now you're trying to allocate another 38GB for the concatenated array. But you only have, say, 64GB of RAM plus 2GB of swap. So, there's not enough room to allocate another 38GB. On some platforms, this allocation will just fail, which hopefully NumPy would just catch and raise a MemoryError. On other platforms, the allocation may succeed, but as soon as you try to actually touch all of that memory you'll segfault (see overcommit handling in linux for an example). On other platforms, the system will try to auto-expand swap, but then if you're out of disk space it'll segfault.

Whatever the reason, if you can't fit X1, X2, and X into memory at the same time, what can you do instead?

  • Just build X in the first place, and fill X1 and X2 by filling sliced views of X.
  • Write X1 and X2 out to disk, concatenate on disk, and read them back in.
  • Send X1 and X2 to a subprocess that reads them iteratively and builds X and then continues the work.

Upvotes: 8

ederollora
ederollora

Reputation: 1181

Not an expert in numpy but, why not use numpy.concatenate()?

http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

For example:

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
   [3, 4],
   [5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
   [3, 4, 6]])

Upvotes: -3

Related Questions