Reputation: 2508
I work with Pandas dataframe.I want to aggregate data by one column and after that to summarize other columns.You can see example below:
data = {'name': ['Company1', 'Company2', 'Company1', 'Company2', 'Company5'],
'income': [0, 180395, 4543168, 7543168, 73],
'turnover': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, columns = ['name', 'income', 'turnover'])
df
INCOME_GROUPED = df.groupby(['name']).agg({'income':sum,'turnover':sum})
So this code above work well and give good result. Now next step is selection. I want to select only to columns from INCOME_GROUPED dataframe.
INCOME_SELECT = INCOME_GROUPED[['name','income']]
But after execution this line of code I got this error:
"None of [Index(['name', 'income'], dtype='object')] are in the [columns]"
So can anybody help me how to solve this problem ?
Upvotes: 1
Views: 60
Reputation: 18315
In this line
INCOME_SELECT = WAGE_GROUPED[['name','income']]
you probably meant INCOME_GROUPED
instead of WAGE_GROUPED
. Also, uppercase variable names are frowned upon unless global constants.
This only solves "income"
being available as a column. By default, groupby
makes the grouper column the index in the result; this is changable by passing as_index=False
to it:
income_grouped = (df.groupby("name", as_index=False)
.agg({"income": "sum", "turnover": "sum"})
income_selected = income_grouped[["name", "income"]].copy()
Note also the copy
at the very end to avoid the infamous SettingWithCopyWarning in possible future manipulations.
Last note is that since you aggregated all columns and did it with the same method ("sum"), you can go for
income_grouped = df.groupby("name", as_index=False).sum()
Upvotes: 1
Reputation:
You need to call reset_index()
after agg()
:
INCOME_GROUPED = df.groupby(['name']).agg({'income':sum,'turnover':sum}).reset_index()
# ^^^^^^^^^^^^^^ add this
Output:
>>> INCOME_GROUPED[['name', 'income']]
name income
0 Company1 4543168
1 Company2 7723563
2 Company5 73
Upvotes: 1