silent_hunter
silent_hunter

Reputation: 2508

Selection of columns

I work with Pandas dataframe.I want to aggregate data by one column and after that to summarize other columns.You can see example below:

    data = {'name': ['Company1', 'Company2', 'Company1', 'Company2', 'Company5'], 
            'income': [0, 180395, 4543168, 7543168, 73], 
            'turnover': [4, 24, 31, 2, 3]}
    df = pd.DataFrame(data, columns = ['name', 'income', 'turnover'])
    df

INCOME_GROUPED = df.groupby(['name']).agg({'income':sum,'turnover':sum})

So this code above work well and give good result. Now next step is selection. I want to select only to columns from INCOME_GROUPED dataframe.

INCOME_SELECT =  INCOME_GROUPED[['name','income']]

But after execution this line of code I got this error:

"None of [Index(['name', 'income'], dtype='object')] are in the [columns]"

So can anybody help me how to solve this problem ?

Upvotes: 1

Views: 60

Answers (2)

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

In this line

INCOME_SELECT = WAGE_GROUPED[['name','income']]

you probably meant INCOME_GROUPED instead of WAGE_GROUPED. Also, uppercase variable names are frowned upon unless global constants.

This only solves "income" being available as a column. By default, groupby makes the grouper column the index in the result; this is changable by passing as_index=False to it:

income_grouped = (df.groupby("name", as_index=False)
                    .agg({"income": "sum", "turnover": "sum"})

income_selected = income_grouped[["name", "income"]].copy()

Note also the copy at the very end to avoid the infamous SettingWithCopyWarning in possible future manipulations.

Last note is that since you aggregated all columns and did it with the same method ("sum"), you can go for

income_grouped = df.groupby("name", as_index=False).sum()

Upvotes: 1

user17242583
user17242583

Reputation:

You need to call reset_index() after agg():

INCOME_GROUPED = df.groupby(['name']).agg({'income':sum,'turnover':sum}).reset_index()
#                                                                       ^^^^^^^^^^^^^^ add this

Output:

>>> INCOME_GROUPED[['name', 'income']]
       name   income
0  Company1  4543168
1  Company2  7723563
2  Company5       73

Upvotes: 1

Related Questions