measure_theory
measure_theory

Reputation: 874

Replace values in DataFrame

I have a large DataFrame object where missing values are pre-coded as 0.001. These missing values only occur at the beginning of the DataFrame. For example:

df = pd.DataFrame({'a':[0.001, 0.001, 0.001, 0.50, 0.10, 0.001, 0.75]})

The problem is.... sometimes there are actual 0.001 values not at the beginning of the DataFrame that I dont want to drop (like in the example above).

What I want is:

df = pd.DataFrame({'a' :[NaN, NaN, NaN, 0.50, 0.10, 0.001, 0.75]})

Put I can't figure out a simple way to only drop the 0.001 values at the beginning of the DataFrame, and ignore the others that occur later on.

The dataset I'm working with is massive, so I was hoping to avoide looping through each variable and each index (which is what I'm currently doing but takes a bit too long).

Any ideas?

Upvotes: 0

Views: 103

Answers (1)

user2285236
user2285236

Reputation:

Here's an approach:

df.mask(df[df!=0.001].ffill().isnull(), np.nan)
Out: 
       a
0    NaN
1    NaN
2    NaN
3  0.500
4  0.100
5  0.001
6  0.750

This first creates a boolean mask where the df does not equal 0.001. The cells that have 0.001 will be NaN in this selection. If you forward fill this Series/DataFrame, the first elements will not be filled. Then you can use this as a mask to the original DataFrame.

Upvotes: 3

Related Questions