Reputation: 1772
I have the following df:
df3 = pd.DataFrame(np.array([['Iza', 'Tuesday'],['Martin', 'Friday'],['John', 'Monday'],['Iza', 'Tuesday'],['Iza', 'Tuesday'],['Iza', 'Wednesday'],['Sara', 'Friday'], ['Sara', 'Friday'], ['Sara', 'Sunday'],['Silvia', 'Monday'],['Silvia', 'Wednesday'],['Paul', 'Monday'],['Paul', 'Tuesday'],['Paul', 'Wednesday']]),
columns=['Name', 'Day'])
df3:
Name Day
0 Iza Tuesday
1 Martin Friday
2 John Monday
3 Iza Tuesday
4 Iza Tuesday
5 Iza Wednesday
6 Sara Friday
7 Sara Friday
8 Sara Sunday
9 Silvia Monday
10 Silvia Wednesday
11 Paul Monday
12 Paul Tuesday
13 Paul Wednesday
I got the count of days for each user:
oo = df3.groupby(['Name','Day'])['Day'].size().reset_index(name='counts')
result:
Name Day counts
0 Iza Tuesday 3
1 Iza Wednesday 1
2 John Monday 1
3 Martin Friday 1
4 Paul Monday 1
5 Paul Tuesday 1
6 Paul Wednesday 1
7 Sara Friday 2
8 Sara Sunday 1
9 Silvia Monday 1
10 Silvia Wednesday 1
dropped unwanted users with only one day record;
uniq_us = oo[oo.duplicated(['Name'], keep=False)]
result:
Name Day counts
0 Iza Tuesday 3
1 Iza Wednesday 1
4 Paul Monday 1
5 Paul Tuesday 1
6 Paul Wednesday 1
7 Sara Friday 2
8 Sara Sunday 1
9 Silvia Monday 1
10 Silvia Wednesday 1
Now I want to get the percentage of counts in each grouped days by name:
uniq_us.groupby(['Name','Day'])['counts'].apply(lambda x: x.value_counts(normalize=True)) * 100
I got:
Name Day
Iza Tuesday 3 100.0
Wednesday 1 100.0
Paul Monday 1 100.0
Tuesday 1 100.0
Wednesday 1 100.0
Sara Friday 2 100.0
Sunday 1 100.0
Silvia Monday 1 100.0
Wednesday 1 100.0
Name: counts, dtype: float64
I do not know how can I calculate it per grouped Name
Desired output:
Name Day
Iza Tuesday 3 75.0
Wednesday 1 25.0
Paul Monday 1 33.33
Tuesday 1 33.33
Wednesday 1 33.33
Sara Friday 2 66.66
Sunday 1 33.34
Silvia Monday 1 50.0
Wednesday 1 50.0
Name: counts, dtype: float64
Upvotes: 2
Views: 57
Reputation: 215057
Another option is to normalize the count at early stage:
(df3.groupby('Name')
.Day
.value_counts(normalize=True)
.mul(100)
.rename('Counts')
.reset_index()
.pipe(lambda x: x[x.duplicated(['Name'], keep=False)]))
# Name Day Counts
#0 Iza Tuesday 75.000000
#1 Iza Wednesday 25.000000
#4 Paul Monday 33.333333
#5 Paul Tuesday 33.333333
#6 Paul Wednesday 33.333333
#7 Sara Friday 66.666667
#8 Sara Sunday 33.333333
#9 Silvia Monday 50.000000
#10 Silvia Wednesday 50.000000
Upvotes: 1
Reputation: 18316
You can normalize the counts with their sum via transform
:
uniq_us["pcnt"] = uniq_us.groupby("Name").counts.transform(lambda x: x / x.sum())
to get
>>> uniq_us
Name Day counts pcnt
0 Iza Tuesday 3 0.750000
1 Iza Wednesday 1 0.250000
4 Paul Monday 1 0.333333
5 Paul Tuesday 1 0.333333
6 Paul Wednesday 1 0.333333
7 Sara Friday 2 0.666667
8 Sara Sunday 1 0.333333
9 Silvia Monday 1 0.500000
10 Silvia Wednesday 1 0.500000
You can put 100 *
and round(2)
in lambda
and set the Name
and Day
as the index to match the output:
...transform(lambda x: (100 * x / x.sum()).round(2))
uniq_us = uniq_us.set_index(["Name", "Day"])
to get
counts pcnt
Name Day
Iza Tuesday 3 75.00
Wednesday 1 25.00
Paul Monday 1 33.33
Tuesday 1 33.33
Wednesday 1 33.33
Sara Friday 2 66.67
Sunday 1 33.33
Silvia Monday 1 50.00
Wednesday 1 50.00
Upvotes: 2
Reputation: 23146
You're almost there. Try:
>>> uniq_us.groupby(["Name", "Day"]).sum()/uniq_us.groupby("Name").sum()
counts
Name Day
Iza Tuesday 0.750000
Wednesday 0.250000
Paul Monday 0.333333
Tuesday 0.333333
Wednesday 0.333333
Sara Friday 0.666667
Sunday 0.333333
Silvia Monday 0.500000
Wednesday 0.500000
Upvotes: 1