df.columns and df2.columns are the same object?

Question

I have a dataframe df2 which is a copy of another dataframe:

In [5]: df = DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]})
In [6]: df
Out[6]:
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

In [7]: df2 = df.copy()

and is therefore not the same object:

In [8]: df is df2
Out[8]: False

In [9]: hex(id(df))
Out[9]: '0x89c6550L'

In [10]: hex(id(df2))
Out[10]: '0x89c6a58L'

My question is regarding the columns of those two dataframes. Why is it that the columns objects returned from df.columns and df2.columns are the same object?

In [11]: df.columns is df2.columns
Out[11]: True

In [12]: hex(id(df.columns))
Out[12]: '0x89bfb38L'

In [13]: hex(id(df2.columns))
Out[13]: '0x89bfb38L'

But if I make a change, then they become two separate objects?

In [14]: df2.rename(columns={"B":"D"}, inplace=True)

In [15]: df.columns
Out[15]: Index([A, B, C], dtype=object)

In [16]: df2.columns
Out[16]: Index([A, D, C], dtype=object)

In [17]: df.columns is df2.columns 
Out[17]: False

In [18]: hex(id(df.columns))
Out[18]: '0x89bfb38L'

In [19]: hex(id(df2.columns))
Out[19]: '0x89bfc88L'

Can someone explain what's going on here? Why aren't df.columns and df2.columns two separate objects from the start?

Jeff · Accepted Answer

df.columns is an Index object.

These are immutable objects (kind of like how strings/ints are immutable. You can change a reference to one, but not the actual object).

This allows for sharing and hence performance efficiency (and you don't need to actually copy the memory when copying an index). When you 'change' one, you are really getting a new object (as opposed to a reference to the original one)

Almost all of pandas operations return you a new object, see here: http://pandas.pydata.org/pandas-docs/stable/basics.html#copying

so rename is equivalent to copying and then assigning to the index (columns and/or index, whatever you are changing). BUT, this act of assignment creates a new index object. (so rename is just a convenience method to this operation)

df.columns and df2.columns are the same object?

Answers (1)

Related Questions