Reputation: 441
I have two big numpy 2D arrays. One shape is X1 (1877055, 1299), another is X2 (1877055, 1445). I then use
X = np.hstack((X1, X2))
to concatenate the two arrays into a bigger array. However, the program doesn't run and exit with code -9. It didn't show any error message.
What is the problem? How can I concatenate such two big numpy 2D arrays?
Upvotes: 3
Views: 8105
Reputation: 365767
Unless there's something wrong with your NumPy build or your OS (both of which are unlikely), this is almost certainly a memory error.
For example, let's say all these values are float64
. So, you've already allocated at least 18GB and 20GB for these two arrays, and now you're trying to allocate another 38GB for the concatenated array. But you only have, say, 64GB of RAM plus 2GB of swap. So, there's not enough room to allocate another 38GB. On some platforms, this allocation will just fail, which hopefully NumPy would just catch and raise a MemoryError
. On other platforms, the allocation may succeed, but as soon as you try to actually touch all of that memory you'll segfault (see overcommit handling in linux for an example). On other platforms, the system will try to auto-expand swap, but then if you're out of disk space it'll segfault.
Whatever the reason, if you can't fit X1
, X2
, and X
into memory at the same time, what can you do instead?
X
in the first place, and fill X1
and X2
by filling sliced views of X
.X1
and X2
out to disk, concatenate on disk, and read them back in.X1
and X2
to a subprocess that reads them iteratively and builds X
and then continues the work.Upvotes: 8
Reputation: 1181
Not an expert in numpy but, why not use numpy.concatenate()
?
http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html
For example:
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])
Upvotes: -3