GeorgeLPerkins
GeorgeLPerkins

Reputation: 1146

Pandas Indexing-View-Versus-Copy

I have a dataframe with several columns. Later, a column titled 'Active' is added. If the 'Volume' column contains anything greater than 0, I need to set 'Active' to 1.

This is a simple example of how I've attempted it:

import pandas as pd

active_df = pd.DataFrame(columns=['Volume'])
active_df['Volume'] = 0, 0, 22, 22, 0, 22, 0, 22, 0, 22
active_df['Active'] = 0

active_df['Active'].loc[active_df['Volume'] > 0] = 1

print(active_df)

Although this produces the expected results, I constantly get a warning: "A value is trying to be set on a copy of a slice from a DataFrame"

I have read the referenced page: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy but still can't solve this.

I thought that I had dealt with this in other code and resolved it, but I can't find an example in existing code.

Upvotes: 3

Views: 4112

Answers (2)

GeorgeLPerkins
GeorgeLPerkins

Reputation: 1146

I rediscovered this question after it being up for a year after a recent upvote. Having learned a lot more about Pandas since it was asked, I thought I'd revisit the difference in my 'copy of a slice' and the solution.

My original attempt was:

active_df['Active'].loc[active_df['Volume'] > 0] = 1

Which was really a convoluted way at best.

First I'm gettting boolean values for active_df['Volume'] > 0 And then where the row value is TRUE, I'm setting the slice active_df['Active'] to 1. Although this worked, there was uncertainty in whether this was a view or copy of the dataframe.


The solution was:

active_df.loc[active_df['Volume'] > 0, 'Active'] = 1

In the active_df dataframe, locate the rows where active_df['Volume'] > 0, and the column 'Active', and set those values to 1.

Or stated a different way: Set a value of 1 for the 'Active' column for the rows that have a value of 0 in the 'Volume' column.

So you are really working on the whole dataframe (active_df.loc) instead of the slice and possible copy (active_df['Active'].loc)

Thank you again to @Deena for providing the solution.

Upvotes: 8

Deena
Deena

Reputation: 6213

I believe that the copies and views internals are different from through the verions, since I don't get that warning using 0.20.3.
I would totally understand if the latest releases would move some of the Views operations to copies, given the volume of confusions and possible bugs that caused.

The safest option for all the versions is:

active_df.loc[active_df['Volume'] > 0, 'Active'] = 1

And you can always double check if the filtered dataframe is a copy or a view:

active_df['Active'].loc[active_df['Volume'] > 0].is_view 

Upvotes: 1

Related Questions