pragmatic learner
pragmatic learner

Reputation: 447

Find total from groupby.size() command

I have a line of code which produces a tabled count from a groupby command of two columns:

beat_pri = df_2011.groupby(['Beat', 'Priority']).size()

table produced:

Beat  Priority    Count
01X   1.0          394
      2.0         1816

02X   1.0          644
      2.0         1970

02Y   1.0          661

                  ... 

What I am trying to do from this is create a new column 'Total' which is the total count grouped by the beat column. For e.g.

Beat     Priority   Count  Total    
01X       1.0       394     2210
          2.0       1816

02X       1.0       644     2614
          2.0       1970

02Y       1.0       661     2970
          2.0       2309
        ... 

What I have tried so far is:

beat_pri2011['Total'] = df_2011.groupby(['Beat']).size().to_frame('total')

However this produces:

Beat   Priority  Count  Total   
01X    1.0       394    NaN
       2.0       1816   NaN

02X    1.0       644    NaN
       2.0       1970   NaN

02Y    1.0       661    NaN
       2.0       2309   NaN

                    ...

Thanks in advance.

(I have re-typed this question to clear up any confusion it brought earlier on.)

Upvotes: 2

Views: 106

Answers (1)

Dev Khadka
Dev Khadka

Reputation: 5461

I mis-understood your question earlier. What I understand from you recent edit is you want to sort the aggregates after groupby by total count.

Is the following what you wanted

df = pd.DataFrame([('ST&SAN P...', '1.0', '06X', '1.0', 'PDOA', 'POSSIBLE DEAD PERSON'), ('ST&HANNAH ST', '1.0', '07X', '1.0', '415GS', '415 GUNSHOTS'), ('ST&MARKET ST', '1.0', '07X', '2.0', '415GS', '415 GUNSHOTS'), ('PRENTISS ST', '2.0', '06X', '2.0', '415GS', '415 GUNSHOTS'), ('AV&FOOTH...', '2.0', '07X', '1.0', '415GS', '415 GUNSHOTS')], columns=('Location', 'Area-Id', 'Beat', 'Priority', 'Type-Id', 'Incident Type Description'))
df2 = df.groupby(["Beat", "Priority"])[["Location"]].count() \
    .rename(dict(Location="Count"), axis=1).reset_index()

df2.merge(df2.groupby("Beat").sum(), on="Beat", suffixes=("", "_Total")) \
    .sort_values("Count_Total", ascending=False)

Result

    Beat    Priority    Count   Count_Total
2   07X 1.0 2   3
3   07X 2.0 1   3
0   06X 1.0 1   2
1   06X 2.0 1   2

Upvotes: 2

Related Questions