Julio Arriaga
Julio Arriaga

Reputation: 970

Aggregate function to data frame in pandas

I want to create a dataframe from an aggregate function. I thought that it would create by default a dataframe as this solution states, but it creates a series and I don't know why (Converting a Pandas GroupBy object to DataFrame).

The dataframe is from Kaggle's San Francisco Salaries. My code:

df=pd.read_csv('Salaries.csv')

in: type(df)
out: pandas.core.frame.DataFrame

in: df.head()
out: EmployeeName   JobTitle    TotalPay    TotalPayBenefits    Year    Status  2BasePay    2OvertimePay    2OtherPay   2Benefits   2Year
0   NATHANIEL FORD  GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY  567595.43   567595.43   2011    NaN 167411.18   0.00    400184.25   NaN 2011-01-01
1   GARY JIMENEZ    CAPTAIN III (POLICE DEPARTMENT) 538909.28   538909.28   2011    NaN 155966.02   245131.88   137811.38   NaN 2011-01-01
2   ALBERT PARDINI  CAPTAIN III (POLICE DEPARTMENT) 335279.91   335279.91   2011    NaN 212739.13   106088.18   16452.60    NaN 2011-01-01
3   CHRISTOPHER CHONG   WIRE ROPE CABLE MAINTENANCE MECHANIC    332343.61   332343.61   2011    NaN 77916.00    56120.71    198306.90   NaN 2011-01-01
4   PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)    326373.19   326373.19   2011    NaN 134401.60   9737.00 182234.59   NaN 2011-01-01

in: df2=df.groupby(['JobTitle'])['TotalPay'].mean()
type(df2)
out: pandas.core.series.Series

I want df2 to be a dataframe with the columns 'JobTitle' and 'TotalPlay'

Upvotes: 1

Views: 4393

Answers (1)

piRSquared
piRSquared

Reputation: 294338

Breaking down your code:

df2 = df.groupby(['JobTitle'])['TotalPay'].mean()

The groupby is fine. It's the ['TotalPay'] that is the misstep. That is telling the groupby to only execute the the mean function on the pd.Series df['TotalPay'] for each group defined in ['JobTitle']. Instead, you want to refer to this column with [['TotalPay']]. Notice the double brackets. Those double brackets say pd.DataFrame.


Recap

df2 = df2=df.groupby(['JobTitle'])[['TotalPay']].mean()

Upvotes: 5

Related Questions