Specifying dataframe index as date at time of creation

Question

I'm a newcomer to Python (3.5) from an R background and I'm struggling a little with the differences in the way dataframes are created and used. In particular I want to create a dataframe using a series of dates for the index. The following experimental code (note the commented out index) works more or less as I expect:

import pandas as pd
import numpy as np

np.random.seed(123456)
num_periods=5
monthindex=pd.date_range('1/1/2014', periods=num_periods, freq='MS')
dd = pd.DataFrame(data={'date':monthindex,
                        'c1': pd.Series(np.random.uniform(10, 20, size=num_periods)),
                        'c2': pd.Series(np.random.uniform(30, 40, size=num_periods))},
                  # index=monthindex,
)
print(dd)

...and gives me this output:

          c1         c2       date
0  11.269698  33.362217 2014-01-01
1  19.667178  34.513765 2014-02-01
2  12.604760  38.402551 2014-03-01
3  18.972365  31.231021 2014-04-01
4  13.767497  35.430262 2014-05-01

...and I can specify the index after creation like this:

dd.index = monthindex
print(dd)

...which gets me this, which looks right:

                   c1         c2       date
2014-01-01  11.269698  33.362217 2014-01-01
2014-02-01  19.667178  34.513765 2014-02-01
2014-03-01  12.604760  38.402551 2014-03-01
2014-04-01  18.972365  31.231021 2014-04-01
2014-05-01  13.767497  35.430262 2014-05-01

But if I uncomment the index call in the code above I get the date in the index but I am left with Na values like this:

            c1  c2       date
2014-01-01 NaN NaN 2014-01-01
2014-02-01 NaN NaN 2014-02-01
2014-03-01 NaN NaN 2014-03-01
2014-04-01 NaN NaN 2014-04-01
2014-05-01 NaN NaN 2014-05-01

I suspect it may be because the two Series objects don't share any values with the index but I don't really understand what is going on.

What is happening and how should I specify a date index during the creation of the dataframe rather than tacking it on after the call to DataFrame?

Mike M&#252;ller · Accepted Answer

Using the NumPy arrays directly without creating Series first works:

import pandas as pd
import numpy as np

np.random.seed(123456)
num_periods=5
monthindex=pd.date_range('1/1/2014', periods=num_periods, freq='MS')
dd = pd.DataFrame(data={'date':monthindex,
                        'c1': np.random.uniform(10, 20, size=num_periods),
                        'c2': np.random.uniform(30, 40, size=num_periods)},
                  index=monthindex,
)
print(dd)

Output:

                   c1         c2       date
2014-01-01  11.269698  33.362217 2014-01-01
2014-02-01  19.667178  34.513765 2014-02-01
2014-03-01  12.604760  38.402551 2014-03-01
2014-04-01  18.972365  31.231021 2014-04-01
2014-05-01  13.767497  35.430262 2014-05-01

The Series come with their own indices that do not match the month index. The NumPy arrays don't have an index and use the index you provide.

Specifying dataframe index as date at time of creation

Answers (2)

Related Questions