Bert Zangle
Bert Zangle

Reputation: 177

OWNDATA flag unreliable in both directions for numpy arrays?

After reading the questions here and here, it seems that the OWNDATA flag isn't always reliable for numpy arrays in determining whether an object is a copy of another one or not.

The answers to these questions seem to say however that OWNDATA sometimes produces 'false negatives' (edit: on second thought, that might be more of a 'false positive') i.e. answers 'false' when in fact an object is a copy, and can be safely changed without changing the original.

Now I'm wondering: is the following a case of the flag also sometimes yielding false positives, i.e. claiming something is a copy, when it really isn't? (Question, part 1) Alternatively, I misunderstand what the OWNDATA flag is intended to tell me...

a = np.arange(3)
b = a
print(b.flags['OWNDATA']) # True

b = b+1
print(a==b) # False => b is copy, matching `OWNDATA` flag

b = a
print(b.flags['OWNDATA']) # True
b += 1
print(a==b) # True => b not a copy, mismatch with `OWNDATA`?

(Question, part 2) Finally: if neither OWNDATA, or a.base is b are reliable indicators to tell whether an object is a copy or not, then what what is the right to determine it? The questions linked above mention may_share_memory, but that one seems to be overeager in the other direction, answering 'True' on anything that is not constructed or created as a an explicit np.copy of another object.

Upvotes: 0

Views: 1860

Answers (1)

hpaulj
hpaulj

Reputation: 231530

Looking more at your example:

a = np.arange(3)
b = a
print(b.flags['OWNDATA']) # True

b is a reference to a; they are the same Python object

b = b+1
print(a==b) # False => b is copy, matching `OWNDATA` flag

b is now a new array, produced by the addition operation. It no longer points to the original array. You could just as well looked a c = b+1, and tested c. And it does not share the data buffer with the original b.

b = a
print(b.flags['OWNDATA']) # True
b += 1
print(a==b) # True => b not a copy, mismatch with `OWNDATA`?

b has been modified in-place; it's the same array object that is was before, but with new values. Since a references the same object, it also 'appears' to be changed. b is a and still has its own data buffer. You might also look at the id(b), and id(a).

OWNDATA is meaningful only when comparing one array with a view or other operation - a different array object which may or may not share a data buffer with the original.

You may need to dig more into what b=a does in Python (it's not a numpy issue), and how arrays are constructed with data buffers.

Upvotes: 1

Related Questions