user13948
user13948

Reputation: 453

Transpose row to column and groupby

I have a dataframe like the following:

data = [[0,9], [10, 15], [11, 14],[1,3],[2,7],[4,31]]
  
df = pd.DataFrame(data, columns=['score_x', 'score_y'])

I want to put both score_x and score_y into the same bins, e.g. [0,5,10,15,20,25,30]. The I need a table with rows of score_x bins and column headers of score_y bins with the count in between:

         score_y  0-5   5-10   10-15   15-20
score_x
0-5                     

5-10                      count...

10-15

15-20

I have tried

scores=df.groupby(bins)['score_x'].agg(['count'])

and get a single column with score_x grouped

And

scores = df.groupby(['scores_x',pd.cut(df.scores_y,bins)])
scores.size().unstack()

And this gives groups in column headers but I then can't use groupby again on a groupby object.

Thanks for any suggestions! :)

Upvotes: 0

Views: 41

Answers (1)

jezrael
jezrael

Reputation: 863541

Use crosstab and add parameter label to cut:

bins = [0,5,10,15,20,25,30]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])] 

df = pd.crosstab(pd.cut(df.score_x,bins, labels=labels), 
                 pd.cut(df.score_y,bins, labels=labels))
print (df)
score_y  0-5  5-10  10-15
score_x                  
0-5        1     1      0
5-10       0     0      1
10-15      0     0      1

If need all bins (from list) use:

df = df.groupby([pd.cut(df.score_x,bins, labels=labels), 
                 pd.cut(df.score_y,bins, labels=labels)]).size().unstack()
print (df)
score_y  0-5  5-10  10-15  15-20  20-25  25-30
score_x                                       
0-5        1     1      0      0      0      0
5-10       0     0      1      0      0      0
10-15      0     0      1      0      0      0
15-20      0     0      0      0      0      0
20-25      0     0      0      0      0      0
25-30      0     0      0      0      0      0

Upvotes: 1

Related Questions