jma
jma

Reputation: 497

Pandas DataFrames acting as active views of another DataFrame

I have a main df, called df, and 3 additional dfs that were made simply by saying df2 = df, df3 = df, df4 = df. So they're set to my main df.

I added a column to df and for some reason, it was also added to df2, df3, df4. When I dropped the column from df, it also dropped from df2, df3, df4.

I've definitely created sub-dfs with slightly different purposes from the main df, and it should be creating a copy, and not a view, of the dataframe--right?

Upvotes: 1

Views: 180

Answers (2)

EdChum
EdChum

Reputation: 394179

No, you created 3 references to the orig df, to make a copy do

df2 = df.copy()

This will make a deep copy so that any modifications affect the copy and not the original df.

You need to be explicit in your code, to avoid any ambiguities.

Additionally doing things like this:

df_maybe_a_view = df[some_cols]

May return a view but then modifications to this will raise:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

Which may mean that the original df has been modified. The problem here is that it becomes ambiguous as to your intentions and it's hard to tell for sure if your reference is operating on a view or not. So you have to be explicit by using copy to make a copy and using .loc and iloc for setting values, see the docs

Upvotes: 2

Corley Brigman
Corley Brigman

Reputation: 12401

You need to use copy:

Signature: pd.DataFrame.copy(self, deep=True)
Docstring:
Make a copy of this objects data.

Parameters
----------
deep : boolean or string, default True
    Make a deep copy, including a copy of the data and the indices.
    With ``deep=False`` neither the indices or the data are copied.

    Note that when ``deep=True`` data is copied, actual python objects
    will not be copied recursively, only the reference to the object.
    This is in contrast to ``copy.deepcopy`` in the Standard Library,
    which recursively copies object data.

Returns
-------
copy : type of caller
File:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/generic.py
Type:      function

When you write df = pd.DataFrame(), it creates an object, and assigns a name df to it. When you then write df2 = df, all that's doing is assigning another name to the same object. This is true for all objects in python - there are objects, and there are names bound to those objects. So when you modify an object, and other names point to the same object, they all of course change.

Doing df2 = df.copy() creates a new object and assigns df2 to it, which is what you want.

Upvotes: 0

Related Questions