qqqwww
qqqwww

Reputation: 531

Filter out zeros in np.percentile

I am trying to decile the column score of a DataFrame.

I use the following code:

np.percentile(df['score'], np.arange(0, 100, 10))

My problem is in score, there are lots of zeros. How can I filter out these 0 values and only decile the rest of values?

Upvotes: 1

Views: 3815

Answers (3)

MSeifert
MSeifert

Reputation: 152677

You can simply mask zeros and then remove them from your column using boolean indexing:

score = df['score']
score_no_zero = score[score != 0]
np.percentile(score_no_zero, np.arange(0,100,10))

or in one step:

np.percentile(df['score'][df['score'] != 0], np.arange(0,100,10))

Upvotes: 0

piRSquared
piRSquared

Reputation: 294338

Consider the dataframe df

df = pd.DataFrame(
    dict(score=np.random.rand(20))
).where(
    np.random.choice([True, False], (20, 1), p=(.8, .2)),
    0
)

       score
0   0.380777
1   0.559356
2   0.103099
3   0.800843
4   0.262055
5   0.389330
6   0.477872
7   0.393937
8   0.189949
9   0.571908
10  0.133402
11  0.033404
12  0.650236
13  0.593495
14  0.000000
15  0.013058
16  0.334851
17  0.000000
18  0.999757
19  0.000000

Use pd.qcut to decile

pd.qcut(df.loc[df.score != 0, 'score'], 10, range(10))

0     4
1     6
2     1
3     9
4     3
5     4
6     6
7     5
8     2
9     7
10    1
11    0
12    8
13    8
15    0
16    3
18    9
Name: score, dtype: category
Categories (10, int64): [0 < 1 < 2 < 3 ... 6 < 7 < 8 < 9]

Or all together

df.assign(decile=pd.qcut(df.loc[df.score != 0, 'score'], 10, range(10)))

       score decile
0   0.380777    4.0
1   0.559356    6.0
2   0.103099    1.0
3   0.800843    9.0
4   0.262055    3.0
5   0.389330    4.0
6   0.477872    6.0
7   0.393937    5.0
8   0.189949    2.0
9   0.571908    7.0
10  0.133402    1.0
11  0.033404    0.0
12  0.650236    8.0
13  0.593495    8.0
14  0.000000    NaN
15  0.013058    0.0
16  0.334851    3.0
17  0.000000    NaN
18  0.999757    9.0
19  0.000000    NaN

Upvotes: 0

user2285236
user2285236

Reputation:

Filter them with boolean indexing:

df.loc[df['score']!=0, 'score']

or

df['score'][lambda x: x!=0]

and pass that to the percentile function.

np.percentile(df['score'][lambda x: x!=0], np.arange(0,100,10))

Upvotes: 3

Related Questions