Reputation: 169
I find different solution for my problem, but i find the correct answer. SO now i have to ask:
I want to find the last day of a month in a list... Every time i get the following Error:
AttributeError: 'RangeIndex' object has no attribute 'month'
I started like this:
temp=pandas.read_csv(filename, delimiter=',')
dates=pandas.to_datetime(temp['Var1'])
and i get a list like this:
0 2017-01-01 06:00:00
1 2017-01-01 07:00:00
2 2017-01-01 08:00:00
...
Name: Var1, Length: 4053, dtype: datetime64[ns]
In the list different day are missed, for example some Month can have only 10 days.
I used the following code
dates[dates.groupby(dates.index.month).apply(lambda s: np.max(s.index))].tolist()
Upvotes: 4
Views: 2303
Reputation: 999
Where other solutions only use the .month
value, you might miss the differences across different years. Grouping across .year
and .month
can avoid this. I'm not sure if this is what you require
df = pd.DataFrame(dict(dates=pd.date_range(start=pd.datetime(2017, 10, 1), periods=20, freq='3W')))
Out[]:
dates
0 2017-10-01
1 2017-10-22
2 2017-11-12
3 2017-12-03
4 2017-12-24
5 2018-01-14
6 2018-02-04
7 2018-02-25
8 2018-03-18
9 2018-04-08
10 2018-04-29
11 2018-05-20
12 2018-06-10
13 2018-07-01
14 2018-07-22
15 2018-08-12
16 2018-09-02
17 2018-09-23
18 2018-10-14
19 2018-11-04
df.groupby([df.dates.dt.year, df.dates.dt.month]).max()
Out[]:
dates
dates dates
2017 10 2017-10-22
11 2017-11-12
12 2017-12-24
2018 1 2018-01-14
2 2018-02-25
3 2018-03-18
4 2018-04-29
5 2018-05-20
6 2018-06-10
7 2018-07-22
8 2018-08-12
9 2018-09-23
10 2018-10-14
11 2018-11-04
df.groupby([df.dates.dt.year, df.dates.dt.month]).max().values
Out[]:
array([['2017-10-22T00:00:00.000000000'],
['2017-11-12T00:00:00.000000000'],
['2017-12-24T00:00:00.000000000'],
['2018-01-14T00:00:00.000000000'],
['2018-02-25T00:00:00.000000000'],
['2018-03-18T00:00:00.000000000'],
['2018-04-29T00:00:00.000000000'],
['2018-05-20T00:00:00.000000000'],
['2018-06-10T00:00:00.000000000'],
['2018-07-22T00:00:00.000000000'],
['2018-08-12T00:00:00.000000000'],
['2018-09-23T00:00:00.000000000'],
['2018-10-14T00:00:00.000000000'],
['2018-11-04T00:00:00.000000000']], dtype='datetime64[ns]')
Upvotes: 2
Reputation: 71610
Try this:
import pandas as pd
df = pd.DataFrame({'a':['2017-01-01 06:00:00',
'2017-01-01 07:00:00',
'2017-02-02 08:00:00']})
df['a'] = pd.to_datetime(df['a'])
print(df['a'].groupby(df.a.dt.to_period("M")).apply(lambda x: x.values.max()))
Output:
a
2017-01 2017-01-01 07:00:00
2017-02 2017-02-02 08:00:00
Freq: M, Name: a, dtype: datetime64[ns]
Upvotes: 2