Reputation: 17570
I have a dataframe df2
which is a copy of another dataframe:
In [5]: df = DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]})
In [6]: df
Out[6]:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
In [7]: df2 = df.copy()
and is therefore not the same object:
In [8]: df is df2
Out[8]: False
In [9]: hex(id(df))
Out[9]: '0x89c6550L'
In [10]: hex(id(df2))
Out[10]: '0x89c6a58L'
My question is regarding the columns of those two dataframes. Why is it that the columns objects returned from df.columns
and df2.columns
are the same object?
In [11]: df.columns is df2.columns
Out[11]: True
In [12]: hex(id(df.columns))
Out[12]: '0x89bfb38L'
In [13]: hex(id(df2.columns))
Out[13]: '0x89bfb38L'
But if I make a change, then they become two separate objects?
In [14]: df2.rename(columns={"B":"D"}, inplace=True)
In [15]: df.columns
Out[15]: Index([A, B, C], dtype=object)
In [16]: df2.columns
Out[16]: Index([A, D, C], dtype=object)
In [17]: df.columns is df2.columns
Out[17]: False
In [18]: hex(id(df.columns))
Out[18]: '0x89bfb38L'
In [19]: hex(id(df2.columns))
Out[19]: '0x89bfc88L'
Can someone explain what's going on here? Why aren't df.columns
and df2.columns
two separate objects from the start?
Upvotes: 4
Views: 2932
Reputation: 128948
df.columns is an Index object.
These are immutable objects (kind of like how strings/ints are immutable. You can change a reference to one, but not the actual object).
This allows for sharing and hence performance efficiency (and you don't need to actually copy the memory when copying an index). When you 'change' one, you are really getting a new object (as opposed to a reference to the original one)
Almost all of pandas operations return you a new object, see here: http://pandas.pydata.org/pandas-docs/stable/basics.html#copying
so rename
is equivalent to copying and then assigning to the index (columns and/or index, whatever you are changing). BUT, this act of assignment creates a new index object.
(so rename
is just a convenience method to this operation)
Upvotes: 4