Reputation: 3
After some data manipulation and using a pivot table I was able to arrive at this desired result:
domain code count
a.com 200 10
502 5
404 8
c.com 200 31
502 9
503 15
b.com 200 5
404 2
503 1
d.com 200 47
404 22
My objective now is to sum the count for each domain and sort the domains in descending order of that count like this:
domain code count
d.com 200 47
404 22
c.com 200 31
502 9
503 15
a.com 200 10
502 5
404 8
b.com 200 5
404 2
503 1
If I do d.groupby('domain').sum()
I just get the domains and the sum of count, but I don't want to present the sum, just need it for the sorting.
Additionally I'd like to add a percentage for each code relative to each domain, like this:
domain code count %
d.com 200 47 68,115942029
404 22 31,884057971
c.com 200 31 56,363636363
502 9 16,363636363
503 15 27,272727272
a.com 200 10 etc...
502 5
404 8
b.com 200 5
404 2
503 1
Upvotes: 0
Views: 1202
Reputation: 8768
I believe you can use sort values for this.
df = df.sort_values(by=['domain','count'],ascending = (0,0))
Upvotes: 0
Reputation: 214957
You need groupby.transform
and also argsort
if you don't want to attach an extra column to the data frame:
# calculate sum by group
domain_sum = df['count'].groupby(df.domain).transform('sum')
# sort values in descending order
df = df.loc[(-domain_sum).argsort()]
# calculate percentage
df['%'] = df['count'] / domain_sum
print(df.reset_index(drop=True))
# domain code count %
#0 d.com 200 47 0.681159
#1 d.com 404 22 0.318841
#2 c.com 200 31 0.563636
#3 c.com 502 9 0.163636
#4 c.com 503 15 0.272727
#5 a.com 200 10 0.434783
#6 a.com 502 5 0.217391
#7 a.com 404 8 0.347826
#8 b.com 200 5 0.625000
#9 b.com 404 2 0.250000
#10 b.com 503 1 0.125000
Upvotes: 3