ker_laeda86
ker_laeda86

Reputation: 287

How to get first day of each month in datetimeindex, when month doesn't starts with 01?

I have a DataFrame with DateTimeIndex, for 10 years, day by day. I need to extract rows which corresponds to first day of each month. However not all months starts with 01, some starts with 02, 03, 04 etc.

2020-01-02
2020-01-03
...
2020-01-31
2020-02-03
...
2020-02-29
2020-03-02

Expected df must be:

2020-01-02
2020-02-03
2020-03-02

Any suggestions how to do that?

Upvotes: 3

Views: 2287

Answers (1)

jezrael
jezrael

Reputation: 862481

Use DatetimeIndex.to_period for months periods, then test duplicates by Index.duplicated and filter in boolean indexing with inverted mask for first days of months:

#if necessary
df = df.sort_index()

print (df)
            A
date         
2020-01-02  4
2020-01-03  9
2020-01-31  2
2020-02-03  7
2020-02-29  3
2020-03-02  1

df1 = df[~df.index.to_period('m').duplicated()]
print (df1)
            A
date         
2020-01-02  4
2020-02-03  7
2020-03-02  1

Details:

print (df.index.to_period('m'))
PeriodIndex(['2020-01', '2020-01', '2020-01', '2020-02', '2020-02', '2020-03'], 
            dtype='period[M]', name='date', freq='M')

print (df.index.to_period('m').duplicated())
[False  True  True False  True False]

print (~df.index.to_period('m').duplicated())
[ True False False  True False  True]

Another solution is use GroupBy.head:

df1 = df.groupby(df.index.to_period('m')).head(1)
print (df1)
            A
date         
2020-01-02  4
2020-02-03  7
2020-03-02  1

Upvotes: 4

Related Questions