JZ.
JZ.

Reputation: 21877

Using Pandas how do I count groups of data?

I have a large dataframe:

      peak.count  purity.score material  
0           10.0      0.134814      ADB  
1           10.0      0.134814      ADB  
2           17.0      0.116754      ADB  
3           17.0      0.116754      ADB  
4           17.0      0.102921      ADB   
...          ...           ...      ...  
1269        14.0      0.166039      SSA  
1270        14.0      0.166039      SSA  
1271        14.0      0.166039      SSA  
1272        12.0      0.169396      SSA  
1273        12.0      0.169396      SSA  
1274        12.0      0.169396      SSA 

I'm curious about grouping the purity.score by a range and then counting those values within the ranges. For example, if 15 of my values fall between 0.1 and 0.2 I would like the output to reflect 15 at 1. I have tried something that uses value_counts in conjunction with a numpy range, but does not count the values within the groups:

First I do this: s = pd.Series(df['purity.score'])

pd.value_counts(s).reindex(np.arange(0,1,0.1)).fillna(0)
0.0    362.0
0.1      0.0
0.2      0.0
0.3      0.0
0.4      0.0
0.5      0.0
0.6      0.0
0.7      0.0
0.8      0.0
0.9      0.0

How can I group these values? Note I wish to use this table to feed an API in order to render a javascript histogram, but do not wish to use Bokeh or Matplotlib. I need access to the table.

Upvotes: 1

Views: 260

Answers (2)

foglerit
foglerit

Reputation: 8269

You can do it with the cut function:

df.groupby(pd.cut(df['purity.score'], bins=10)).count()

Here, cut is dividing df['purity.score'] into 10 bins of its choice, but you can define the bin boundaries by passing an array.

Upvotes: 2

sulkeh
sulkeh

Reputation: 947

Your best option is probably the groupby function. To group by ranges of size 0.1 you could do this (the dataframe is called df here)

 df['purity.score'].groupby((df['purity.score']*10).astype(int)).count()

the argument here is the purity.score column, multiplied by 10, and then converted to int - an operation which maps [0.1-0.2) -> 1, [0.2, 0.3) -> 2 etc. Not very pretty, but it works.

Upvotes: 0

Related Questions