Reputation: 41
This should be easy but I'm having a surprisingly annoying time at it. The code below shows me doing a Pandas groupby operation so I can calculate variance by symbol. Unfortunately what happens is that the aggregation command seems to get rid of the integer index, so I am trying to create a new integer list and add this as a column to the table and set as a new index.
vardataframe = voldataframe.groupby('Symbol')
vardataframe = vardataframe.aggregate(np.var)
vardataframe['newindex']= np.arange(1,(len(vardataframe)+1))
vardataframe.set_index(['newindex'])
vardataframe = vardataframe.ix[:,['newindex','Symbol','volatility']]
However what comes out is the below vardataframe.head() result, which does not properly change the index of the table from Symbol back to numeric. And this hurts me in a line or two when I try to do a merge command.
newindex Symbol volatility
Symbol
A 1 NaN 0.000249
AA 2 NaN 0.000413
AAIT 3 NaN 0.000237
AAL 4 NaN 0.001664
AAME 5 NaN 0.001283
As you see the problems with the above are now there are two Symbol columns and the index hasn't been set correctly. What I'd like to do is get rid of the second Symbol column and make newindex the new index. Anyone know what I'm doing wrong here? (Perhaps a misunderstanding of the ix command). Much appreciated!
Upvotes: 2
Views: 6355
Reputation: 6383
You can use as_index=False
to preserve integer index. You need only one line to do what you need:
vardataframe = voldataframe.groupby('Symbol', as_index=False).var()
A couple of things in your code:
vardataframe.set_index(['newindex'])
will set ndewindex
as index, but returns a new dataframe which is not used. You can do vardataframe.set_index(['newindex'], inplace=True)
if you want this.
vardataframe.ix[:,['newindex','Symbol','volatility']]
gives you a column Symbol
of all NaN
because Symbol
is not a column of vardataframe
, but only exists in its index
. Querying a non-exist column with ix
gives all NaN
. As @user2600939 mentioned, you can do vardataframe.reset_index(inplace=True)
(or vardataframe=vardataframe.reset_index()
to put Symbol
back as a column.
Upvotes: 3
Reputation: 4051
Instead of making a new index manually just reset it use...
df = df.reset_index()
Upvotes: 0