Reputation: 95
I'm currently trying to figure out how to replace a subset of values in my pandas DataFrame. This is the solution I've come up with, but it runs too slowly (still hasn't terminated after 5 minutes).
new = df.loc[:, df.dtypes != "O"]
new = new.mask(new < 0)
df.loc[:, df.dtypes != "O"] = new
df
Upvotes: 1
Views: 660
Reputation: 31011
Numpy is known to operate faster than Pandas.
So use the following code base on np.where:
for col in df:
if df[col].dtype != 'O':
df[col] = np.where(df[col] >= 0, df[col], np.nan)
I performed a test using %timeit on a DataFrame of shape (50000, 3) (1 string, 1 int and 1 float column) and got the time about 3 times shorter than for your code, whereas the other solution is only marginally better than yours.
And a note about the usage of %timeit: Since your code alters the source DataFrame, then before each test you have to:
Otherwise subsequent executions of the tested code operate on a changed DataFrame (the result of previous execution).
Upvotes: 1