Reputation: 173
I am working with a very large data series of floats in Pandas 12.0. What I am trying to do is set extreme outliers to NaNs in this series, which represents a standardized feature vector (mean is 0, std is 1).
I have no trouble making a boolean mask of the feature vector to find extreme outliers:
mask = feature_series > 10 | feature_series < 10
This takes minimal resources. However, when I attempt to actually use this mask I get a memory explosion and have to force exit before a crash occurs. This happens with:
feature_series[mask] = np.nan
It's not limited to this operation either. I also get a memory explosion with:
mask.any()
What's making this happen? I feel like it may be a bug, but I'm still relatively new to Pandas and can't be sure.
Upvotes: 1
Views: 214
Reputation: 77971
probably you need some parentheses
mask = (feature_series > 10) | (feature_series < 10)
Upvotes: 2