Zmann3000
Zmann3000

Reputation: 816

How to remove clustered/unclustered values less than a certain length from pandas dataframe?

If I have a pandas data frame like this:

      A
 1    1
 2    1
 3   NaN
 4    1
 5   NaN
 6    1
 7    1
 8    1
 9    1
 10  NaN
 11   1
 12   1
 13   1

How do I remove values that are clustered in a length less than some value (in this case four) for example? Such that I get an array like this:

      A
 1   NaN
 2   NaN
 3   NaN
 4   NaN
 5   NaN
 6    1
 7    1
 8    1
 9    1
 10  NaN
 11  NaN
 12  NaN
 13  NaN

Upvotes: 0

Views: 62

Answers (1)

rafaelc
rafaelc

Reputation: 59274

Using groupby and np.where

s = df.groupby(df.A.isnull().cumsum()).transform(lambda s: pd.notnull(s).sum())
df['B'] = np.where(s.A>=4, df.A, np.nan)

Outputs

    A   B
1   1.0 NaN
2   1.0 NaN
3   NaN NaN
4   1.0 NaN
5   NaN NaN
6   1.0 1.0
7   1.0 1.0
8   1.0 1.0
9   1.0 1.0
10  NaN NaN
11  1.0 NaN
12  1.0 NaN
13  1.0 NaN

Upvotes: 1

Related Questions