FooBar
FooBar

Reputation: 16478

Create Datetime from year and month hidden in multi-index

I have a dataframe where year and month are hidden in the multi-index. I want to create a datetime index as additional column (or separate series with same index).

                                           price            
                                            mean     mom_2  
foo          bar              year month                                      
997182819645 11               2010 1      1.1900  3.000000  
                                   2      2.2625  4.001769  

I thought of adding the two levels of indices together as strings, and then read in that sequence into pd.to_datetime(). However, adding the two indices, I faced problems. I can add them up as integers just fine, but if I want to add them up as strings, I face some error:

In[193]: df.index.get_level_values('year').values.astype(str)
Out[193]: 

array(['2010', '2010', '2010', ..., '2014', '2014', '2014'], 
      dtype='<U21')
In[194]: df.index.get_level_values('month').values.astype(str)
Out[194]: 

array(['1', '2', '3', ..., '10', '11', '12'], 
      dtype='<U21')
In[195]: df.index.get_level_values('month').values.astype(str) + df.index.get_level_values('year').values.astype(str)

TypeError: ufunc 'add' did not contain a loop with signature matching types
 dtype('<U21') dtype('<U21') dtype('<U21')

How can I add create the datetime index here?

Upvotes: 2

Views: 2189

Answers (1)

jezrael
jezrael

Reputation: 862511

I think you can use to_datetime, but first need multiple year and month values:

y = df.index.get_level_values('year')
m = df.index.get_level_values('month')

df['Date'] = pd.to_datetime(y * 10000 + m * 100 + 1, format="%Y%m%d")
print (df)
                              price                 Date
                                foo       bar           
foo          bar year month                             
997182819645 11  2010 1      1.1900  3.000000 2010-01-01
                      2      2.2625  4.001769 2010-02-01

If need then append column to index:

df['Date'] = pd.to_datetime(y * 10000 + m * 100 + 1, format="%Y%m%d")
df.set_index('Date', append=True, inplace=True)
print (df)
                                         price          
                                           foo       bar
foo          bar year month Date                        
997182819645 11  2010 1     2010-01-01  1.1900  3.000000
                      2     2010-02-01  2.2625  4.001769

Another solution with creating new DataFrame, but need last 0.18.1 version:

y = df.index.get_level_values('year')
m = df.index.get_level_values('month')
d = pd.Index(len(df.index) * [1], name='day')
df1 = pd.DataFrame({'year':y, 'month':m, 'day':d}, index=df.index)

df['Date']  = pd.to_datetime(df1)
print (df)
                              price                 Date
                                foo       bar           
foo          bar year month                             
997182819645 11  2010 1      1.1900  3.000000 2010-01-01
                      2      2.2625  4.001769 2010-02-01

Upvotes: 3

Related Questions