Reputation: 453
I have a dataframe like the following:
data = [[0,9], [10, 15], [11, 14],[1,3],[2,7],[4,31]]
df = pd.DataFrame(data, columns=['score_x', 'score_y'])
I want to put both score_x and score_y into the same bins, e.g. [0,5,10,15,20,25,30]. The I need a table with rows of score_x bins and column headers of score_y bins with the count in between:
score_y 0-5 5-10 10-15 15-20
score_x
0-5
5-10 count...
10-15
15-20
I have tried
scores=df.groupby(bins)['score_x'].agg(['count'])
and get a single column with score_x grouped
And
scores = df.groupby(['scores_x',pd.cut(df.scores_y,bins)])
scores.size().unstack()
And this gives groups in column headers but I then can't use groupby again on a groupby object.
Thanks for any suggestions! :)
Upvotes: 0
Views: 41
Reputation: 863541
Use crosstab
and add parameter label
to cut
:
bins = [0,5,10,15,20,25,30]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
df = pd.crosstab(pd.cut(df.score_x,bins, labels=labels),
pd.cut(df.score_y,bins, labels=labels))
print (df)
score_y 0-5 5-10 10-15
score_x
0-5 1 1 0
5-10 0 0 1
10-15 0 0 1
If need all bins (from list) use:
df = df.groupby([pd.cut(df.score_x,bins, labels=labels),
pd.cut(df.score_y,bins, labels=labels)]).size().unstack()
print (df)
score_y 0-5 5-10 10-15 15-20 20-25 25-30
score_x
0-5 1 1 0 0 0 0
5-10 0 0 1 0 0 0
10-15 0 0 1 0 0 0
15-20 0 0 0 0 0 0
20-25 0 0 0 0 0 0
25-30 0 0 0 0 0 0
Upvotes: 1