Reputation: 1
I've create a pandas dataframe, train_data, using pd.read_csv(). Then I create a new dataframe, X, using a subset of the columns of train_data, and create a new column in X using a boolean mask from an existing column in X.
# this code issues the SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
train_data = pd.read_csv("train.csv")
X = train_data[['a', 'b', 'c', 'd', 'e']]
X['f'] = (X['e'] == value).astype(float)
# this code does not issue the SettingWithCopyWarning
train_data = pd.read_csv("train.csv")
X = train_data[['a', 'b', 'c', 'd', 'e']].copy()
X['f'] = (X['e'] == value).astype(float)
The only difference is I explicitly made X a copy in the second code snippet, but I thought it already was a copy (not a view) given how it was created. Any thoughts on why I am getting this? I am using pandas version 1.4.4
I tried to recreate the problem with a simple example (see below), but I was unsuccessful.
# df1 and df3 are identical
# this code does not issue the SettingWithCopyWarning
df1 = pd.DataFrame({'a': ['x', 'x', 'y'], 'b': ['x', 'x', 'y']})
df2 = df1[['a']]
df2['b'] = (df2['a'] == 'x').astype(float)
# this code does not issue the SettingWithCopyWarning
df3 = pd.read_csv("example.csv")
df4 = df3[['a']]
df4['b'] = (df4['a'] == 'x').astype(float)
Upvotes: 0
Views: 35