Reputation: 531
I am trying to decile the column score
of a DataFrame
.
I use the following code:
np.percentile(df['score'], np.arange(0, 100, 10))
My problem is in score
, there are lots of zeros. How can I filter out these 0 values and only decile the rest of values?
Upvotes: 1
Views: 3815
Reputation: 152677
You can simply mask zeros and then remove them from your column using boolean indexing:
score = df['score']
score_no_zero = score[score != 0]
np.percentile(score_no_zero, np.arange(0,100,10))
or in one step:
np.percentile(df['score'][df['score'] != 0], np.arange(0,100,10))
Upvotes: 0
Reputation: 294338
Consider the dataframe df
df = pd.DataFrame(
dict(score=np.random.rand(20))
).where(
np.random.choice([True, False], (20, 1), p=(.8, .2)),
0
)
score
0 0.380777
1 0.559356
2 0.103099
3 0.800843
4 0.262055
5 0.389330
6 0.477872
7 0.393937
8 0.189949
9 0.571908
10 0.133402
11 0.033404
12 0.650236
13 0.593495
14 0.000000
15 0.013058
16 0.334851
17 0.000000
18 0.999757
19 0.000000
Use pd.qcut
to decile
pd.qcut(df.loc[df.score != 0, 'score'], 10, range(10))
0 4
1 6
2 1
3 9
4 3
5 4
6 6
7 5
8 2
9 7
10 1
11 0
12 8
13 8
15 0
16 3
18 9
Name: score, dtype: category
Categories (10, int64): [0 < 1 < 2 < 3 ... 6 < 7 < 8 < 9]
Or all together
df.assign(decile=pd.qcut(df.loc[df.score != 0, 'score'], 10, range(10)))
score decile
0 0.380777 4.0
1 0.559356 6.0
2 0.103099 1.0
3 0.800843 9.0
4 0.262055 3.0
5 0.389330 4.0
6 0.477872 6.0
7 0.393937 5.0
8 0.189949 2.0
9 0.571908 7.0
10 0.133402 1.0
11 0.033404 0.0
12 0.650236 8.0
13 0.593495 8.0
14 0.000000 NaN
15 0.013058 0.0
16 0.334851 3.0
17 0.000000 NaN
18 0.999757 9.0
19 0.000000 NaN
Upvotes: 0
Reputation:
Filter them with boolean indexing:
df.loc[df['score']!=0, 'score']
or
df['score'][lambda x: x!=0]
and pass that to the percentile function.
np.percentile(df['score'][lambda x: x!=0], np.arange(0,100,10))
Upvotes: 3