user23504599
user23504599

Reputation: 1

Why do I get a SettingWithCopyWarning when manipulating a pandas DataFrame in one situation but not another seemingly identical situation?

I've create a pandas dataframe, train_data, using pd.read_csv(). Then I create a new dataframe, X, using a subset of the columns of train_data, and create a new column in X using a boolean mask from an existing column in X.

# this code issues the SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
train_data = pd.read_csv("train.csv")
X = train_data[['a', 'b', 'c', 'd', 'e']]
X['f'] = (X['e'] == value).astype(float)

# this code does not issue the SettingWithCopyWarning 
train_data = pd.read_csv("train.csv")
X = train_data[['a', 'b', 'c', 'd', 'e']].copy()
X['f'] = (X['e'] == value).astype(float)

The only difference is I explicitly made X a copy in the second code snippet, but I thought it already was a copy (not a view) given how it was created. Any thoughts on why I am getting this? I am using pandas version 1.4.4

I tried to recreate the problem with a simple example (see below), but I was unsuccessful.

# df1 and df3 are identical

# this code does not issue the SettingWithCopyWarning 
df1 = pd.DataFrame({'a': ['x', 'x', 'y'], 'b': ['x', 'x', 'y']})
df2 = df1[['a']]
df2['b'] = (df2['a'] == 'x').astype(float)

# this code does not issue the SettingWithCopyWarning 
df3 = pd.read_csv("example.csv")
df4 = df3[['a']]
df4['b'] = (df4['a'] == 'x').astype(float)

Upvotes: 0

Views: 35

Answers (0)

Related Questions