Pavel Sitnikov
Pavel Sitnikov

Reputation: 11

Python lambda function with 2 dataframes

This is a fragment of my code in Python. This code perfectly changes dataframe X_real_zeros, but it also changes X, why it happens?

X_real_zeros = X
for column in numeric_cols:
     X_real_zeros[column] = X[column].apply(lambda x: 0 if np.isnan(x) == 1 else x)

If I apply something like this:

X['columnii'] = X[column].apply(lambda x: 0 if np.isnan(x) == 1 else x)

It won't change the X[column] in initial dataframe X.

Upvotes: 1

Views: 349

Answers (3)

codefire
codefire

Reputation: 391

When you assign

X_real_zeros = X

X_real_zeros is being assigned an internal reference to object X. So as X_real_zeros gets mutated, X also changes values. To safely copy variables, do:

X.copy()

This will create a new copy of X and you will be able to safely change X_real_zeros.

Upvotes: 0

2342G456DI8
2342G456DI8

Reputation: 1809

The problem is with this line X_real_zeros = X, instead of just assign you should use:

X_real_zeros = X.copy()

You may refer to why should I make a copy of a data frame in pandas for more information.

Upvotes: 1

polku
polku

Reputation: 1590

When you do X_real_zeros = X you don't create a copy of X called X_real_zeros, you create a new binding with your dataframe called X_real_zeros, that is X and X_real_zeros point to the same space in memory. It works the same as with lists or dict, the solution is to use an explicit copy.

X_real_zeros = X.copy()

Upvotes: 0

Related Questions