Reputation: 73
I was explaining[1] in-place operations vs out-of-place operations to a new user of Pandas. This resulted in us discussing passing objects by reference of by value.
Naturally, I wanted to show pandas.DataFrame.values
as I thought it shared the memory location of the underlying data of the DataFrame. However, I was surprised with and then sidetracked by the results of the following code segment.
import pandas as pd
df = pd.DataFrame({'x': [1,2,3,4],
'y': [5,4,3,2]})
print(df)
df.values += 1 # raises AttributeError
x y
0 1 5
1 2 4
2 3 3
3 4 2
<ipython-input-126-9fa9f393972b>:8: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
df.values += 1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
5169 else:
-> 5170 object.__setattr__(self, name, value)
5171 except (AttributeError, TypeError):
AttributeError: can't set attribute
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-126-9fa9f393972b> in <module>
6 print(df)
7
----> 8 df.values += 1
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
5178 stacklevel=2,
5179 )
-> 5180 object.__setattr__(self, name, value)
5181
5182 def _dir_additions(self):
AttributeError: can't set attribute
However, despite this error, if we re-examine the df, it has changed.
print(df)
x y
0 2 6
1 3 5
2 4 4
3 5 3
First, we can write df.values += 1
as df.values = df.values.__iadd__(1)
That means the RHS of this expression evaluates properly resulting in the underlying data being changed. Then, re-assigning df.values
to a new value raises the exception.
If I break up these two operations, no error is raised and the underlying data is changed.
print(df)
values = df.values
values += 1
print(df)
x y
0 2 6
1 3 5
2 4 4
3 5 3
x y
0 3 7
1 4 6
2 5 5
3 6 4
.values
be treated differently than with __getattr__/__setattr__
?Part of me wants to say this is not a bug as the user should read the documentation and use the recommend replacement pandas.DataFrame.to_numpy.
However, part of me says that it is pretty unintuitive to see a "AttributeError: can't set attribute" but have the underlying operation actually work. That being said, I can't think of a solution that allows these operations to work in the proper situations while still preventing improper use.
Does anyone have any insights into this?
[1]: Until I got derailed by this issue and [Insert Link] potential issue.
Upvotes: 4
Views: 108
Reputation: 9422
Pass-by-value vs. pass-by-reference in Python is a knotty topic, see Emulating pass-by-value behaviour in python and also read the comments under the question
This is the 'state of the art' :
Not quite. Python passes arguments neither by reference nor by value, but by assignment.
source : https://realpython.com/python-pass-by-reference/
Similar is in https://www.geeksforgeeks.org/pass-by-reference-vs-value-in-python/
Outgoing from this i think this behavior is not a bug but it is in a grey zone. This behavior is in my opinion rooted in the linking between Pandas and Numpy. df.values
returns a numpy representation (an array) of the dataframe ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html ) call it a
and a+1
is valid syntax for increasing an entire numpy array (https://scipy-lectures.org/intro/numpy/operations.html). On the other hand according to the error message, Pandas [!] does not allow new columns to be created via a new attribute. This error message emerges from the re-assignment step in df.values =+1
, the re-assignment is in df.values = df.values+1
: df.values
is a numpy array that is increased by df.values+1
(what is valid syntax).
Then this numpy array is re-assigned to its pandas dataframe representation by df.values=df.values+1
what throws the known error message. This step is only allowed to work because it alters the same memory location, the same object. So it is not essentially a bug however it is also not purely white but grey instead...
Upvotes: 1