Reputation: 11
This is a fragment of my code in Python. This code perfectly changes dataframe X_real_zeros, but it also changes X, why it happens?
X_real_zeros = X
for column in numeric_cols:
X_real_zeros[column] = X[column].apply(lambda x: 0 if np.isnan(x) == 1 else x)
If I apply something like this:
X['columnii'] = X[column].apply(lambda x: 0 if np.isnan(x) == 1 else x)
It won't change the X[column] in initial dataframe X.
Upvotes: 1
Views: 349
Reputation: 391
When you assign
X_real_zeros = X
X_real_zeros is being assigned an internal reference to object X. So as X_real_zeros gets mutated, X also changes values. To safely copy variables, do:
X.copy()
This will create a new copy of X and you will be able to safely change X_real_zeros.
Upvotes: 0
Reputation: 1809
The problem is with this line X_real_zeros = X
, instead of just assign you should use:
X_real_zeros = X.copy()
You may refer to why should I make a copy of a data frame in pandas for more information.
Upvotes: 1
Reputation: 1590
When you do X_real_zeros = X
you don't create a copy of X called X_real_zeros, you create a new binding with your dataframe called X_real_zeros, that is X and X_real_zeros point to the same space in memory. It works the same as with lists or dict, the solution is to use an explicit copy.
X_real_zeros = X.copy()
Upvotes: 0