Reputation: 22270
Does numpy allocate new matrices for every operation you perform on a matrix?
For example:
A = np.random.rand(10, 20)
A = 2 * A # Operation 1: Is a copy of A made, and a reference assigned to A?
B = 2 * A # Operation 2: Does B get a completely different copy of A?
C = A # Operation 3: Does C get a reference to A?
And slice operations:
A[0, :] = 3
How about chained operations?
D = A * B * C # Elementwise multiplication, if A * B allocates memory, does
# (A * B) * C allocate another patch of memory?
Numpy's a fantastic library, but I just want to know what happens under the hood. My intuition says that slice operations modify the memory view in place, but I don't know about assignments.
Upvotes: 5
Views: 1611
Reputation: 231385
Keep in mind that a numpy array is a Python object. Python creates and deletes objects continually. The array has attributes shown in the .FLAGS
and .__array_interface__
dictionaries, things like the shape
and dtype
. The attribute that takes up (potentially) a lot of memory is the data buffer. It may be a few bytes long, or may be MB.
Where possible numpy operations try to avoid copying the data buffer. When indexing, it will return a view
if possible. I think the documentation compares views and copies well enough.
But views are different from Python references. A shared reference means two variables (or pointers in a list or dictionary) point to the same Python object. A view
is a different array object, but one which shares the data buffer with another array. A copy has its own data buffer.
In your examples:
A = np.random.rand(10, 20)
A
is a variable pointing to an array object. That object has a data buffer with 200 floats (200*8 bytes).
A = 2 * A # Operation 1: Is a copy of A made, and a reference assigned to A?
2*A
creates a new object, with a new data buffer. None of its data values can be shared with the original A
. A=...
reassigns the A
variable. The old A
object is 'lost', and eventually memory is garbage collected.
B = 2 * A # Operation 2: Does B get a completely different copy of A?
This 2*A
operates on the new A
array. The object is assigned to B
. A
remains unchanged.
C = A # Operation 3: Does C get a reference to A?
Yes, this is just normal Python assignment. C
refers to the same object as A
. id(C)==id(A)
.
B = A[1,:] # B is a view
B
is a reference to a new array object. But that object shares the data buffer with A
. That's because the desired values can be found in the buffer by just starting at a different point, and using a different shape
.
A[0, :] = 3
This LHS slice will change a subset of the values of A
. It is similar to:
B = A[0, :]
B = 3
But there are subtile differences betwee LHS and RHS slices. On the LHS you have to pay more attention to when you get a copy as opposed to a view. I've seen this especially with expressions like A[idx1,:][:,idx2] = 3
.
D = A * B * C
The details of how many intermediate copies are made in a calculation like this are buried in the numpy C code. It's safest to assume that it does something like:
temp1 = A*B
temp2 = temp1*C
D = temp2
(temp1 goes to garbage)
For ordinary calculations it isn't worth worrying about those details. If you are really pushing for speed you could do a timeit
on alternatives. And occasionally we get SO questions about operations giving memory errors
. Do a search to get more details on those.
Upvotes: 4
Reputation: 1929
Yes it creates new arrays. Except C. C and A point to same memory.
You can test all of this yourself. Try the id(A)
command to see where in memory A is pointing. Also, just create a smaller structure and modify parts of it and then see if A, B, and/or C are also updated.
Upvotes: 3