Reputation:
have a list of ranks from 1-100 divided among 1000 people for multiple exams
Snippet:
name rank
mark 1
stuart 2
lee 15
lenord 8
sheldon 99
cubbon 26
stuart 35
lee 40
lenord 66
mark 9
sheldon 1
cubbon 2
mark 100
mark 6
using pivot and groupby how to divide this something like this based on count
name 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-100
mark 3 0 0 0 0 0 0 0 0 1
stuart 1 0 0 1 0 0 0 0 0 0
lee 0 1 0 0 1 0 0 0 0 0
lenord 1 0 0 0 0 0 1 0 0 0
sheldon 1 0 0 0 0 0 0 0 0 1
cubbon 1 0 1 0 0 0 0 0 0 0
tried pivot and groupby , but how to create the columns 0-10 ..... 90-100 automatically rather than manually
Tried this : but it is taking long time
rank_1_10=df[(df['rank'] >= 0) & (df['rank'] <= 10)]
rank_1_10=rank_1_10.groupby(['name']).agg({'rank': 'count'})
......
rank_100=df[(df['rank'] >= 90) & (df['rank'] <= 100)]
rank_10=rank_100.groupby(['name']).agg({'rank': 'count'})
Then i'm merging all these, is there any easy way
Upvotes: 1
Views: 197
Reputation: 863166
Use cut
by np.arange
and labels created by zip
, last value is different, so is corrected adding by 1
in bins and subtracted 1
in labels:
bins = np.arange(0, 110, 10)
#dynamically create labels
labels = ['{}-{}'.format(i, j-1) for i, j in zip(bins[:-1], bins[1:])]
bins[-1] += 1
labels[-1] = '{}-{}'.format(bins[-2], bins[-1]-1)
df['binned'] = pd.cut(df['rank'], bins = bins, labels=labels, right=False)
print (df)
name rank binned
0 mark 1 0-9
1 stuart 2 0-9
2 lee 15 10-19
3 lenord 8 0-9
4 sheldon 99 90-100
5 cubbon 26 20-29
6 stuart 35 30-39
7 lee 40 40-49
8 lenord 66 60-69
9 mark 9 0-9
10 sheldon 1 0-9
11 cubbon 2 0-9
12 mark 100 90-100
13 mark 6 0-9
Then for count is used crosstab
with DataFrame.reindex
for add all not exist columns:
df1 = pd.crosstab(df['name'], df['binned']).reindex(labels, axis=1, fill_value=0)
print (df1)
binned 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-100
name
cubbon 1 0 1 0 0 0 0 0 0 0
lee 0 1 0 0 1 0 0 0 0 0
lenord 1 0 0 0 0 0 1 0 0 0
mark 3 0 0 0 0 0 0 0 0 1
sheldon 1 0 0 0 0 0 0 0 0 1
stuart 1 0 0 1 0 0 0 0 0 0
Upvotes: 1