khajiit
khajiit

Reputation: 95

How to replace values only in a subset of a pandas DataFrame

I'm currently trying to figure out how to replace a subset of values in my pandas DataFrame. This is the solution I've come up with, but it runs too slowly (still hasn't terminated after 5 minutes).

new = df.loc[:, df.dtypes != "O"]
new = new.mask(new < 0)
df.loc[:, df.dtypes != "O"] = new
df

Upvotes: 1

Views: 660

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 31011

Numpy is known to operate faster than Pandas.

So use the following code base on np.where:

for col in df:
    if df[col].dtype != 'O':
        df[col] = np.where(df[col] >= 0, df[col], np.nan)

I performed a test using %timeit on a DataFrame of shape (50000, 3) (1 string, 1 int and 1 float column) and got the time about 3 times shorter than for your code, whereas the other solution is only marginally better than yours.

And a note about the usage of %timeit: Since your code alters the source DataFrame, then before each test you have to:

  • create the DataFrame again (or copy it from some source),
  • run %timeit with -r1 and -n1 options (perform a single test run).

Otherwise subsequent executions of the tested code operate on a changed DataFrame (the result of previous execution).

Upvotes: 1

Related Questions