Reputation: 21877
I have a large dataframe:
peak.count purity.score material
0 10.0 0.134814 ADB
1 10.0 0.134814 ADB
2 17.0 0.116754 ADB
3 17.0 0.116754 ADB
4 17.0 0.102921 ADB
... ... ... ...
1269 14.0 0.166039 SSA
1270 14.0 0.166039 SSA
1271 14.0 0.166039 SSA
1272 12.0 0.169396 SSA
1273 12.0 0.169396 SSA
1274 12.0 0.169396 SSA
I'm curious about grouping the purity.score by a range and then counting those values within the ranges. For example, if 15 of my values fall between 0.1 and 0.2 I would like the output to reflect 15 at 1. I have tried something that uses value_counts in conjunction with a numpy range, but does not count the values within the groups:
First I do this: s = pd.Series(df['purity.score'])
pd.value_counts(s).reindex(np.arange(0,1,0.1)).fillna(0)
0.0 362.0
0.1 0.0
0.2 0.0
0.3 0.0
0.4 0.0
0.5 0.0
0.6 0.0
0.7 0.0
0.8 0.0
0.9 0.0
How can I group these values? Note I wish to use this table to feed an API in order to render a javascript histogram, but do not wish to use Bokeh or Matplotlib. I need access to the table.
Upvotes: 1
Views: 260
Reputation: 8269
You can do it with the cut
function:
df.groupby(pd.cut(df['purity.score'], bins=10)).count()
Here, cut
is dividing df['purity.score']
into 10 bins of its choice, but you can define the bin boundaries by passing an array.
Upvotes: 2
Reputation: 947
Your best option is probably the groupby function. To group by ranges of size 0.1 you could do this (the dataframe is called df
here)
df['purity.score'].groupby((df['purity.score']*10).astype(int)).count()
the argument here is the purity.score column, multiplied by 10, and then converted to int - an operation which maps [0.1-0.2) -> 1, [0.2, 0.3) -> 2 etc. Not very pretty, but it works.
Upvotes: 0