user3812891
user3812891

Reputation: 41

Pandas Dataframes - How do you maintain an index post a group by/aggregation operation?

This should be easy but I'm having a surprisingly annoying time at it. The code below shows me doing a Pandas groupby operation so I can calculate variance by symbol. Unfortunately what happens is that the aggregation command seems to get rid of the integer index, so I am trying to create a new integer list and add this as a column to the table and set as a new index.

vardataframe = voldataframe.groupby('Symbol')
vardataframe = vardataframe.aggregate(np.var)
vardataframe['newindex']= np.arange(1,(len(vardataframe)+1))
vardataframe.set_index(['newindex'])
vardataframe = vardataframe.ix[:,['newindex','Symbol','volatility']]

However what comes out is the below vardataframe.head() result, which does not properly change the index of the table from Symbol back to numeric. And this hurts me in a line or two when I try to do a merge command.

        newindex  Symbol  volatility
Symbol                              
A              1     NaN    0.000249
AA             2     NaN    0.000413
AAIT           3     NaN    0.000237
AAL            4     NaN    0.001664
AAME           5     NaN    0.001283

As you see the problems with the above are now there are two Symbol columns and the index hasn't been set correctly. What I'd like to do is get rid of the second Symbol column and make newindex the new index. Anyone know what I'm doing wrong here? (Perhaps a misunderstanding of the ix command). Much appreciated!

Upvotes: 2

Views: 6355

Answers (2)

Happy001
Happy001

Reputation: 6383

You can use as_index=False to preserve integer index. You need only one line to do what you need:

vardataframe = voldataframe.groupby('Symbol', as_index=False).var()

A couple of things in your code:

vardataframe.set_index(['newindex'])

will set ndewindex as index, but returns a new dataframe which is not used. You can do vardataframe.set_index(['newindex'], inplace=True) if you want this.

vardataframe.ix[:,['newindex','Symbol','volatility']]

gives you a column Symbol of all NaN because Symbol is not a column of vardataframe, but only exists in its index. Querying a non-exist column with ix gives all NaN. As @user2600939 mentioned, you can do vardataframe.reset_index(inplace=True) (or vardataframe=vardataframe.reset_index() to put Symbol back as a column.

Upvotes: 3

ZJS
ZJS

Reputation: 4051

Instead of making a new index manually just reset it use...

df = df.reset_index()

Upvotes: 0

Related Questions