Reputation: 874
I have a large DataFrame object where missing values are pre-coded as 0.001. These missing values only occur at the beginning of the DataFrame. For example:
df = pd.DataFrame({'a':[0.001, 0.001, 0.001, 0.50, 0.10, 0.001, 0.75]})
The problem is.... sometimes there are actual 0.001 values not at the beginning of the DataFrame that I dont want to drop (like in the example above).
What I want is:
df = pd.DataFrame({'a' :[NaN, NaN, NaN, 0.50, 0.10, 0.001, 0.75]})
Put I can't figure out a simple way to only drop the 0.001 values at the beginning of the DataFrame, and ignore the others that occur later on.
The dataset I'm working with is massive, so I was hoping to avoide looping through each variable and each index (which is what I'm currently doing but takes a bit too long).
Any ideas?
Upvotes: 0
Views: 103
Reputation:
Here's an approach:
df.mask(df[df!=0.001].ffill().isnull(), np.nan)
Out:
a
0 NaN
1 NaN
2 NaN
3 0.500
4 0.100
5 0.001
6 0.750
This first creates a boolean mask where the df does not equal 0.001
. The cells that have 0.001
will be NaN
in this selection. If you forward fill this Series/DataFrame, the first elements will not be filled. Then you can use this as a mask to the original DataFrame.
Upvotes: 3