Reputation: 287
I have a DataFrame with DateTimeIndex, for 10 years, day by day. I need to extract rows which corresponds to first day of each month. However not all months starts with 01, some starts with 02, 03, 04 etc.
2020-01-02
2020-01-03
...
2020-01-31
2020-02-03
...
2020-02-29
2020-03-02
Expected df must be:
2020-01-02
2020-02-03
2020-03-02
Any suggestions how to do that?
Upvotes: 3
Views: 2287
Reputation: 862481
Use DatetimeIndex.to_period
for months periods, then test duplicates by Index.duplicated
and filter in boolean indexing
with inverted mask for first days of months:
#if necessary
df = df.sort_index()
print (df)
A
date
2020-01-02 4
2020-01-03 9
2020-01-31 2
2020-02-03 7
2020-02-29 3
2020-03-02 1
df1 = df[~df.index.to_period('m').duplicated()]
print (df1)
A
date
2020-01-02 4
2020-02-03 7
2020-03-02 1
Details:
print (df.index.to_period('m'))
PeriodIndex(['2020-01', '2020-01', '2020-01', '2020-02', '2020-02', '2020-03'],
dtype='period[M]', name='date', freq='M')
print (df.index.to_period('m').duplicated())
[False True True False True False]
print (~df.index.to_period('m').duplicated())
[ True False False True False True]
Another solution is use GroupBy.head
:
df1 = df.groupby(df.index.to_period('m')).head(1)
print (df1)
A
date
2020-01-02 4
2020-02-03 7
2020-03-02 1
Upvotes: 4