Create count matrix based on two columns and respective range values pandas

Question

I have a dataframe which looks like below,

name,value,id
meanerror,0.55,aa
meanamount,120,aa
meanerror,0.45,bb
meanamount,150,bb
meanerror,0.88,cc
meanamount,110,cc
meanerror,0.1,dd
meanamount,50,dd

I would like to create a matrix from this dataframe like below.

,         meanamount,    total_y
meanerror,0-100,100-200
0.0-0.5,    1,    1,      2   
0.5-1,      0,    2,      2
total_x,    1,    3

what I actually need is, in the matrix, each cell should contain count of ids which has value(from value column) in the range on both x and y axis of the matrix. i.e for example the first cell should contain count of ids with meanamount in range 0-100 and meanerror in range 0.0-5.

I have tried pandas pivot table and crosstab but unsure how to achieve this. Can anyone help?

tdy · Accepted Answer

Create a pivot table:

pt = df.pivot(index='id', columns='name', values='value')

# name  meanamount  meanerror
# id                         
# aa         120.0       0.55
# bb         150.0       0.45
# cc         110.0       0.88
# dd          50.0       0.10

Cut the amounts and errors into bins:

pt['meanamount'] = pd.cut(pt['meanamount'], bins=range(0, 300, 100))
pt['meanerror'] = pd.cut(pt['meanerror'], bins=np.arange(0, 1.5, 0.5))

# name  meanamount   meanerror
# id                          
# aa    (100, 200]  (0.5, 1.0]
# bb    (100, 200]  (0.0, 0.5]
# cc    (100, 200]  (0.5, 1.0]
# dd      (0, 100]  (0.0, 0.5]

Create a crosstab of error x amount:

pd.crosstab(pt['meanerror'], pt['meanamount'], margins=True)

# meanamount  (0, 100]  (100, 200]  All
# meanerror                            
# (0.0, 0.5]         1           1    2
# (0.5, 1.0]         0           2    2
# All                1           3    4

Create count matrix based on two columns and respective range values pandas

Answers (1)

Related Questions