Reputation: 61074
I'd like to start with the month 2019-01
and then add any number of consequtive months and use that as an index in a pandas dataframe. I've found suggestions that point to using pd.to_timedelta
, but I keep bumbing into problems.
Here are the details:
If you start with a date and add 5 periods like this:
import pandas as pd
import numpy as np
date = pd.to_datetime("1st of Jan, 2019")
dates = date+pd.to_timedelta(np.arange(5), 'M')
Then you get:
DatetimeIndex(['2019-01-01 00:00:00', '2019-01-31 10:29:06',
'2019-03-02 20:58:12', '2019-04-02 07:27:18',
'2019-05-02 17:56:24'],
dtype='datetime64[ns]', freq=None)
You can easily remove the day and time parts, and remove duplicates to handle the double 2019-01
like this:
dates = dates.map(lambda x: x.strftime('%Y-%m'))
dates = dates.drop_duplicates()
But as you can see, 2019-02
is missing:
Index(['2019-01', '2019-03', '2019-04', '2019-05'], dtype='object')
What is a better way to do this?
Upvotes: 0
Views: 1079
Reputation: 862601
You can create PeriodIndex
by period_range
:
dates = pd.period_range(date, periods=5, freq='M')
print (dates)
PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05'],
dtype='period[M]', freq='M')
Your solution should be working if add 2 days:
dates = (date + pd.to_timedelta(np.arange(5), unit='M') + pd.Timedelta(2, unit='d')).strftime('%Y-%m')
print (dates)
Index(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05'], dtype='object')
Verify:
dates = (date + pd.to_timedelta(np.arange(120), unit='M') + pd.Timedelta(2, unit='d'))
.month.value_counts()
print (dates)
12 10
11 10
10 10
9 10
8 10
7 10
6 10
5 10
4 10
3 10
2 10
1 10
dtype: int64
Upvotes: 3
Reputation: 18647
You could use pandas.date_range
:
pd.date_range(date, periods=5, freq='M').strftime('%Y-%m')
[out]
Index(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05'], dtype='object')
Upvotes: 3