Reputation: 349
I want to know how many people putted GRADE
1, 2, 3, 4 and 5 among the groups of people who made the evaluations 1 time, between 2 and 3 times, and more than 3 times. For instance, the group of people making the evaluations 1 time contains persons with ID 2
and 4
. In this group there are totally one evaluation 5
, and one evaluation 1
.
df =
ID_PERSON EVALUATION_GRADE
1 2
1 2
1 3
1 5
2 5
3 2
3 5
3 1
4 1
5 2
5 1
5 1
The result should be this one:
result =
FREQUENCY_GROUP GRADE_1 GRADE_2 GRADE_3 GRADE_4 GRADE_5
"1 time" 1 0 0 0 1
"2-3 times" 3 2 0 0 1
"> 3 times" 0 2 1 0 1
If I do this df.groupby(['EVALUATION_GRADE']).agg({'ID_PERSON': 'count'}).reset_index()
, then I get total number of persons that putted 1
, 2
, 3
, 4
and 5
. However how can I split them into frequency groups?
Upvotes: 4
Views: 90
Reputation: 117345
Well, basic idea could be as follows - use GroupBy.transform
to get frequency groups and then pandas.crosstab.
to pivot data:
>>> def worker(x):
if len(x) == 1:
return "1 time"
elif len(x) <=3 :
return "2-3 times"
else:
return "> 3 times"
>>> df['FREQUENCY_GROUP'] = df.groupby('ID_PERSON').transform(worker)
>>> df
ID_PERSON EVALUATION_GRADE FREQUENCY_GROUP
0 1 2 > 3 times
1 1 2 > 3 times
2 1 3 > 3 times
3 1 5 > 3 times
4 2 5 1 time
5 3 2 2-3 times
6 3 5 2-3 times
7 3 1 2-3 times
8 4 1 1 time
9 5 2 2-3 times
10 5 1 2-3 times
11 5 1 2-3 times
>>> pd.crosstab(df['FREQUENCY_GROUP'], 'GRADE ' + df['EVALUATION_GRADE'].astype('str'))
EVALUATION_GRADE GRADE 1 GRADE 2 GRADE 3 GRADE 5
FREQUENCY_GROUP
1 time 1 0 0 1
2-3 times 3 2 0 1
> 3 times 0 2 1 1
Upvotes: 1
Reputation: 61947
Here is an answer that should generalize to any number of grades or ids
d = {1: '1 time', 2:'2-3 times', 3:'2-3 times', 4:'> 3 times'}
df['FREQUENCY_GROUP'] = df.groupby('ID_PERSON')['ID_PERSON']\
.transform('size')\
.clip_upper(4)\
.map(d)
df1 = df.pivot_table(index='FREQUENCY_GROUP',
columns='EVALUATION_GRADE',
values='ID_PERSON',
aggfunc='count',
fill_value=0)\
.reindex(columns=range(df.EVALUATION_GRADE.min(),
df.EVALUATION_GRADE.max() + 1),
fill_value=0)
df1.columns = 'GRADE_' + df1.columns.astype(str)
GRADE_1 GRADE_2 GRADE_3 GRADE_4 GRADE_5
FREQUENCY_GROUP
1 time 1 0 0 0 1
2-3 times 3 2 0 0 1
> 3 times 0 2 1 0 1
Upvotes: 1
Reputation: 862501
You can use first transform
with size
first for find frequency, then cut
for create bins and last groupby
with size
, reshape by unstack
and fill missing columns by reindex
:
df['FREQ'] = df.groupby('ID_PERSON')['EVALUATION_GRADE'].transform('size')
bins = [-np.inf, 1, 3, np.inf]
labels=['1 time','2-3 times','> 3 times']
df.FREQ = pd.cut(df.FREQ, bins=bins, labels=labels)
df = df.groupby(['FREQ', 'EVALUATION_GRADE'])['EVALUATION_GRADE'] \
.size() \
.unstack(fill_value=0) \
.reindex(columns=np.arange(1,6), fill_value=0)
df.columns = 'GRADE ' + df.columns.astype(str)
print (df)
GRADE 1 GRADE 2 GRADE 3 GRADE 4 GRADE 5
FREQ
1 time 1 0 0 0 1
2-3 times 3 2 0 0 1
> 3 times 0 2 1 0 1
Upvotes: 3