S.Kociok
S.Kociok

Reputation: 169

How get the last day of a month of a list of dates

I find different solution for my problem, but i find the correct answer. SO now i have to ask:

I want to find the last day of a month in a list... Every time i get the following Error:

AttributeError: 'RangeIndex' object has no attribute 'month'

I started like this:

temp=pandas.read_csv(filename, delimiter=',')
dates=pandas.to_datetime(temp['Var1'])

and i get a list like this:

0      2017-01-01 06:00:00
1      2017-01-01 07:00:00
2      2017-01-01 08:00:00
...
Name: Var1, Length: 4053, dtype: datetime64[ns]

In the list different day are missed, for example some Month can have only 10 days.

I used the following code

dates[dates.groupby(dates.index.month).apply(lambda s: np.max(s.index))].tolist()

Upvotes: 4

Views: 2303

Answers (2)

Dillon
Dillon

Reputation: 999

Where other solutions only use the .month value, you might miss the differences across different years. Grouping across .year and .month can avoid this. I'm not sure if this is what you require

df = pd.DataFrame(dict(dates=pd.date_range(start=pd.datetime(2017, 10, 1), periods=20, freq='3W')))
Out[]:
        dates
0  2017-10-01
1  2017-10-22
2  2017-11-12
3  2017-12-03
4  2017-12-24
5  2018-01-14
6  2018-02-04
7  2018-02-25
8  2018-03-18
9  2018-04-08
10 2018-04-29
11 2018-05-20
12 2018-06-10
13 2018-07-01
14 2018-07-22
15 2018-08-12
16 2018-09-02
17 2018-09-23
18 2018-10-14
19 2018-11-04


df.groupby([df.dates.dt.year, df.dates.dt.month]).max()
Out[]:
                 dates
dates dates           
2017  10    2017-10-22
      11    2017-11-12
      12    2017-12-24
2018  1     2018-01-14
      2     2018-02-25
      3     2018-03-18
      4     2018-04-29
      5     2018-05-20
      6     2018-06-10
      7     2018-07-22
      8     2018-08-12
      9     2018-09-23
      10    2018-10-14
      11    2018-11-04

df.groupby([df.dates.dt.year, df.dates.dt.month]).max().values
Out[]:
array([['2017-10-22T00:00:00.000000000'],
       ['2017-11-12T00:00:00.000000000'],
       ['2017-12-24T00:00:00.000000000'],
       ['2018-01-14T00:00:00.000000000'],
       ['2018-02-25T00:00:00.000000000'],
       ['2018-03-18T00:00:00.000000000'],
       ['2018-04-29T00:00:00.000000000'],
       ['2018-05-20T00:00:00.000000000'],
       ['2018-06-10T00:00:00.000000000'],
       ['2018-07-22T00:00:00.000000000'],
       ['2018-08-12T00:00:00.000000000'],
       ['2018-09-23T00:00:00.000000000'],
       ['2018-10-14T00:00:00.000000000'],
       ['2018-11-04T00:00:00.000000000']], dtype='datetime64[ns]')

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71610

Try this:

import pandas as pd
df = pd.DataFrame({'a':['2017-01-01 06:00:00',
                        '2017-01-01 07:00:00',
                        '2017-02-02 08:00:00']})
df['a'] = pd.to_datetime(df['a'])
print(df['a'].groupby(df.a.dt.to_period("M")).apply(lambda x: x.values.max()))

Output:

a
2017-01   2017-01-01 07:00:00
2017-02   2017-02-02 08:00:00
Freq: M, Name: a, dtype: datetime64[ns]

Upvotes: 2

Related Questions