Alexis G
Alexis G

Reputation: 1339

Adding index level to a dataframe

I have a dataframe with one index as datetime like below and I am looking to add a first columns index (see "target" below) where any dates are crossed to it (First_column).

First_column = ['s0000', 's0001', 's0002', 's0003', 's0004', ...]

Has someone any idea on how to proceed ?

Thank you very much. Alexis

My dataframe :

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 17544 entries, 2015-01-01 00:00:00 to 2016-12-31 23:00:00
Data columns (total 12 columns):

Target:

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 996000 entries, (s0000, 2015-01-01 00:00:00) to (s0999, 2012-12-31 00:00:00)
Data columns (total 8 columns):

SCENARIO DATE

s0000    2015-02-28
         2015-03-03 
         2015-03-04
         2015-03-05
         2015-03-06
         2015-03-07
         2015-03-10
         2015-03-11
         2015-03-12
         2015-03-13
s0001    2015-02-28
         2015-03-03 
         2015-03-04
         2015-03-05
         2015-03-06
         2015-03-07
         2015-03-10
         2015-03-11
         2015-03-12
         2015-03-13
s0002    2015-02-28
         2015-03-03 
         2015-03-04
         2015-03-05
         2015-03-06
         2015-03-07
         2015-03-10
         2015-03-11
         2015-03-12
         2015-03-13
s0003    ...

Upvotes: 1

Views: 337

Answers (2)

Greg
Greg

Reputation: 7131

You could do something like this...

import pandas as pd

first_col = ['s0001', 's0002', 's0003', 's0004']

# Make your datetime index
dt_index = pd.date_range('2015-2-27', freq='B', periods=10)

# Make your first_col index - must be same length as dt_index 
first_col_index = len(dt_index)*first_col
first_col_index.sort()

# Make a dateframe with a hierarchical index
df = pd.DataFrame(range(len(first_col)*len(dt_index)), index=[first_col_index,
                  dt_index.repeat(len(first_col))])

Upvotes: 0

unutbu
unutbu

Reputation: 879501

You could use pd.concat with the keys parameter:

import pandas as pd
df = pd.DataFrame(range(10), index=pd.date_range('2015-2-27', freq='B', periods=10))
#             0
# 2015-02-27  0
# 2015-03-02  1
# 2015-03-03  2
# 2015-03-04  3
# 2015-03-05  4
# 2015-03-06  5
# 2015-03-09  6
# 2015-03-10  7
# 2015-03-11  8
# 2015-03-12  9
first_col = ['s{:04d}'.format(i) for i in range(1,5)]
# ['s0001d', 's0002d', 's0003d', 's0004d']

newdf = pd.concat([df]*len(first_col), keys=first_col)
print(newdf)

yields

                  0
s0001 2015-02-27  0
      2015-03-02  1
      2015-03-03  2
      2015-03-04  3
      2015-03-05  4
      2015-03-06  5
      2015-03-09  6
      2015-03-10  7
      2015-03-11  8
      2015-03-12  9
s0002 2015-02-27  0
      2015-03-02  1
      2015-03-03  2
      2015-03-04  3
      2015-03-05  4
      2015-03-06  5
      2015-03-09  6
      2015-03-10  7
      2015-03-11  8
      2015-03-12  9
s0003 2015-02-27  0
      2015-03-02  1
      2015-03-03  2
      2015-03-04  3
      2015-03-05  4
      2015-03-06  5
      2015-03-09  6
      2015-03-10  7
      2015-03-11  8
      2015-03-12  9
s0004 2015-02-27  0
      2015-03-02  1
      2015-03-03  2
      2015-03-04  3
      2015-03-05  4
      2015-03-06  5
      2015-03-09  6
      2015-03-10  7
      2015-03-11  8
      2015-03-12  9

Happily, I just learned this yesterday from Joris.

Upvotes: 1

Related Questions