Reputation: 381
I have a dataframe which looks like below,
name,value,id
meanerror,0.55,aa
meanamount,120,aa
meanerror,0.45,bb
meanamount,150,bb
meanerror,0.88,cc
meanamount,110,cc
meanerror,0.1,dd
meanamount,50,dd
I would like to create a matrix from this dataframe like below.
, meanamount, total_y
meanerror,0-100,100-200
0.0-0.5, 1, 1, 2
0.5-1, 0, 2, 2
total_x, 1, 3
what I actually need is, in the matrix, each cell should contain count of ids which has value(from value column) in the range on both x and y axis of the matrix. i.e for example the first cell should contain count of ids with meanamount in range 0-100 and meanerror in range 0.0-5.
I have tried pandas pivot table and crosstab but unsure how to achieve this. Can anyone help?
Upvotes: 2
Views: 1361
Reputation: 41327
Create a pivot table:
pt = df.pivot(index='id', columns='name', values='value')
# name meanamount meanerror
# id
# aa 120.0 0.55
# bb 150.0 0.45
# cc 110.0 0.88
# dd 50.0 0.10
Cut the amounts and errors into bins:
pt['meanamount'] = pd.cut(pt['meanamount'], bins=range(0, 300, 100))
pt['meanerror'] = pd.cut(pt['meanerror'], bins=np.arange(0, 1.5, 0.5))
# name meanamount meanerror
# id
# aa (100, 200] (0.5, 1.0]
# bb (100, 200] (0.0, 0.5]
# cc (100, 200] (0.5, 1.0]
# dd (0, 100] (0.0, 0.5]
Create a crosstab of error x amount:
pd.crosstab(pt['meanerror'], pt['meanamount'], margins=True)
# meanamount (0, 100] (100, 200] All
# meanerror
# (0.0, 0.5] 1 1 2
# (0.5, 1.0] 0 2 2
# All 1 3 4
Upvotes: 3