Nate
Nate

Reputation: 167

Issues with groupby and aggregate in pandas

I'm not sure what I'm doing wrong here. This is my code:

df['PV_SUM'] = df.groupby('DOCKET').agg({'PV':sum})

is not returning any results, just an empty series.

This is my hypothetical dataframe:

DOCKET    PV
1a        1
1a        1 
1a        1
1b        0
1b        1
1b        1

and this is the result i'm looking for:

DOCKET    PV      PV_SUM
1a        1         3
1a        1         3
1a        1         3
1b        0         2
1b        1         2
1b        1         2

what am i doing wrong? The dtypes for DOCKET is object and the dtype for PV is float. I've changed the dtype to PV to int but no luck.

Upvotes: 1

Views: 1045

Answers (1)

Use transform instead:

df['PV_SUM'] = df.groupby('DOCKET').PV.transform(sum)

Output:

  DOCKET  PV  PV_SUM
0     1a   1       3
1     1a   1       3
2     1a   1       3
3     1b   0       2
4     1b   1       2
5     1b   1       2

The issue with your code is that df.groupby('DOCKET').agg({'PV':sum}) returns a dataframe with DOCKET as index and PV as value column. When you try assigning it back to the daframe, pandas looks for matching indexes, and, since there are no matchs, it returns NaN.

For example, take a look at the output from df.groupby('DOCKET').agg({'PV':sum}):

        PV
DOCKET    
1a       3
1b       2

As pandas matches the index, you could first set the index of your dataframe to "DOCKET", then it will work as expected:

result = df.groupby('DOCKET').agg({'PV':sum})
df = df.set_index('DOCKET')
df['PV_SUM'] = result

Upvotes: 2

Related Questions