user12983086
user12983086

Reputation:

how to divide these values using pivot or groupby table

have a list of ranks from 1-100 divided among 1000 people for multiple exams

Snippet:

name     rank  
mark     1
stuart   2
lee      15
lenord   8
sheldon  99
cubbon   26
stuart   35
lee      40
lenord   66
mark     9
sheldon  1
cubbon   2
mark     100
mark     6

using pivot and groupby how to divide this something like this based on count

name        0-9   10-19 20-29  30-39 40-49 50-59 60-69 70-79 80-89 90-100

mark         3       0    0      0     0     0     0     0    0      1
stuart       1       0    0      1     0     0     0     0    0      0   
lee          0       1    0      0     1     0     0     0    0      0   
lenord       1       0    0      0     0     0     1     0    0      0  
sheldon      1       0    0      0     0     0     0     0    0      1 
cubbon       1       0    1      0     0     0     0     0    0      0  

tried pivot and groupby , but how to create the columns 0-10 ..... 90-100 automatically rather than manually

Tried this : but it is taking long time

rank_1_10=df[(df['rank'] >= 0) & (df['rank'] <= 10)]

rank_1_10=rank_1_10.groupby(['name']).agg({'rank': 'count'})

......

rank_100=df[(df['rank'] >= 90) & (df['rank'] <= 100)]

rank_10=rank_100.groupby(['name']).agg({'rank': 'count'})

Then i'm merging all these, is there any easy way

Upvotes: 1

Views: 197

Answers (1)

jezrael
jezrael

Reputation: 863166

Use cut by np.arange and labels created by zip, last value is different, so is corrected adding by 1 in bins and subtracted 1 in labels:

bins = np.arange(0, 110, 10)
#dynamically create labels
labels = ['{}-{}'.format(i, j-1) for i, j in zip(bins[:-1], bins[1:])] 
bins[-1] += 1
labels[-1] = '{}-{}'.format(bins[-2], bins[-1]-1)

df['binned'] = pd.cut(df['rank'], bins = bins, labels=labels, right=False)
print (df)
       name  rank  binned
0      mark     1     0-9
1    stuart     2     0-9
2       lee    15   10-19
3    lenord     8     0-9
4   sheldon    99  90-100
5    cubbon    26   20-29
6    stuart    35   30-39
7       lee    40   40-49
8    lenord    66   60-69
9      mark     9     0-9
10  sheldon     1     0-9
11   cubbon     2     0-9
12     mark   100  90-100
13     mark     6     0-9

Then for count is used crosstab with DataFrame.reindex for add all not exist columns:

df1 = pd.crosstab(df['name'], df['binned']).reindex(labels, axis=1, fill_value=0)
print (df1)
binned   0-9  10-19  20-29  30-39  40-49  50-59  60-69  70-79  80-89  90-100
name                                                                        
cubbon     1      0      1      0      0      0      0      0      0       0
lee        0      1      0      0      1      0      0      0      0       0
lenord     1      0      0      0      0      0      1      0      0       0
mark       3      0      0      0      0      0      0      0      0       1
sheldon    1      0      0      0      0      0      0      0      0       1
stuart     1      0      0      1      0      0      0      0      0       0

Upvotes: 1

Related Questions