Reputation: 25528
In NumPy, why does hstack()
copy the data from the arrays being stacked:
A, B = np.array([1,2]), np.array([3,4])
C = np.hstack((A,B))
A[0]=99
gives for C
:
array([1, 2, 3, 4])
whereas hsplit()
creates a view on the data:
a = np.array(((1,2),(3,4)))
b, c = np.hsplit(a,2)
a[0][0]=99
gives for b
:
array([[99],
[ 3]])
I mean - what is the reasoning behind the implementation of this behaviour (which I find inconsistent and hard to remember): I accept that this happens because it's coded that way...
Upvotes: 6
Views: 950
Reputation: 68682
Basically the underlying ndarray data structure only has a single pointer to the start of its data's memory and then stride information about how to move through each dimension. If you concatenate two arrays, it won't know how to move from one memory location to the other. On the other hand, if you split an array into two arrays, each can easily store a pointer to the first element (which is somewhere inside the original array).
The basic C implementation is here, and there is a good discussion at:
http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#life-of-ndarray
Upvotes: 6
Reputation: 179442
NumPy generally tries to create views whenever possible, since memory copies are inefficient and can quite quickly eat up a lot of cycles.
hsplit
splits the input array into multiple output arrays. The output arrays can each be views into a portion of the original parent array (since they are basically simple slices). Thus, for efficiency, NumPy creates views, instead of copies.
hstack
combines two completely separate arrays into a single output array. The underlying array implementation cannot handle two separate data sources in a single array, so there is no way to share the data with the original. Thus, NumPy is forced to create a copy.
Upvotes: 5