Reputation: 497
I have a main df, called df, and 3 additional dfs that were made simply by saying df2 = df, df3 = df, df4 = df. So they're set to my main df.
I added a column to df and for some reason, it was also added to df2, df3, df4. When I dropped the column from df, it also dropped from df2, df3, df4.
I've definitely created sub-dfs with slightly different purposes from the main df, and it should be creating a copy, and not a view, of the dataframe--right?
Upvotes: 1
Views: 180
Reputation: 394179
No, you created 3 references to the orig df, to make a copy do
df2 = df.copy()
This will make a deep copy so that any modifications affect the copy and not the original df.
You need to be explicit in your code, to avoid any ambiguities.
Additionally doing things like this:
df_maybe_a_view = df[some_cols]
May return a view but then modifications to this will raise:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Which may mean that the original df has been modified.
The problem here is that it becomes ambiguous as to your intentions and it's hard to tell for sure if your reference is operating on a view or not. So you have to be explicit by using copy
to make a copy and using .loc
and iloc
for setting values, see the docs
Upvotes: 2
Reputation: 12401
You need to use copy
:
Signature: pd.DataFrame.copy(self, deep=True)
Docstring:
Make a copy of this objects data.
Parameters
----------
deep : boolean or string, default True
Make a deep copy, including a copy of the data and the indices.
With ``deep=False`` neither the indices or the data are copied.
Note that when ``deep=True`` data is copied, actual python objects
will not be copied recursively, only the reference to the object.
This is in contrast to ``copy.deepcopy`` in the Standard Library,
which recursively copies object data.
Returns
-------
copy : type of caller
File: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/generic.py
Type: function
When you write df = pd.DataFrame()
, it creates an object, and assigns a name df
to it. When you then write df2 = df
, all that's doing is assigning another name to the same object. This is true for all objects in python - there are objects, and there are names bound to those objects. So when you modify an object, and other names point to the same object, they all of course change.
Doing df2 = df.copy()
creates a new object and assigns df2
to it, which is what you want.
Upvotes: 0