Reputation: 13603
I have a DataFrame that contains the price of a ticker like this:
date price
2006-01-02 09:00:00 6548.34
2006-01-02 09:01:00 6457.61
2006-01-02 09:02:00 6452.82
2006-01-02 09:03:00 6452.39
2006-01-02 09:04:00 6451.61
...
2021-01-29 13:29:40 15229.54
2021-01-29 13:29:45 15229.54
2021-01-29 13:29:50 15229.54
2021-01-29 13:29:55 15229.54
2021-01-29 13:30:00 15138.31
How can I remove the hour, minute, and second parts in the date?
I mean I want something like this.
date price
2006-01-02 6577.33
2006-01-03 6488.62
2006-01-04 6469.82
2006-01-05 6555.39
2006-01-06 6448.65
...
2021-01-25 15229.53
2021-01-26 15129.34
2021-01-27 14129.64
2021-01-28 15228.34
2021-01-29 15138.31
I only want the last price of each day.
So the thing I want to do is to remove all the other prices of each day.
How can I achieve this?
My df.index
returns the following
>>> df.index
DatetimeIndex(['2006-01-02 09:00:00', '2006-01-02 09:01:00',
'2006-01-02 09:02:00', '2006-01-02 09:03:00',
'2006-01-02 09:04:00', '2006-01-02 09:05:00',
'2006-01-02 09:06:00', '2006-01-02 09:07:00',
'2006-01-02 09:08:00', '2006-01-02 09:09:00',
...
'2021-01-29 13:29:15', '2021-01-29 13:29:20',
'2021-01-29 13:29:25', '2021-01-29 13:29:30',
'2021-01-29 13:29:35', '2021-01-29 13:29:40',
'2021-01-29 13:29:45', '2021-01-29 13:29:50',
'2021-01-29 13:29:55', '2021-01-29 13:30:00'],
dtype='datetime64[ns]', name='date', length=6345485, freq=None)
df.columns
returns
>>> df.columns
Index(['price'], dtype='object', name='stock_id')
My pd.to_datetime(df.index, format='%Y%m%d')
returns
>>> pd.to_datetime(df.index)
DatetimeIndex(['2006-01-02 09:00:00', '2006-01-02 09:01:00',
'2006-01-02 09:02:00', '2006-01-02 09:03:00',
'2006-01-02 09:04:00', '2006-01-02 09:05:00',
'2006-01-02 09:06:00', '2006-01-02 09:07:00',
'2006-01-02 09:08:00', '2006-01-02 09:09:00',
...
'2021-01-29 13:29:15', '2021-01-29 13:29:20',
'2021-01-29 13:29:25', '2021-01-29 13:29:30',
'2021-01-29 13:29:35', '2021-01-29 13:29:40',
'2021-01-29 13:29:45', '2021-01-29 13:29:50',
'2021-01-29 13:29:55', '2021-01-29 13:30:00'],
dtype='datetime64[ns]', name='date', length=6345485, freq=None)
This is so strange. Because the to_datetime()
conversion doesn't seem to work.
Upvotes: 0
Views: 50
Reputation: 13603
Thanks to XavierBrt's answer, I've figured out the conversion for the format of my DataFrame.
df.index = df.index.normalize()
df = df.groupby(df.index).last('price')
Upvotes: 0
Reputation: 1249
You can keep only the date of the datetime, and then group by the date and keeping the last value of each:
df["date"] = pd.to_datetime(df["date"]).dt.normalize()
df = df.groupby("date").last("price")
I added pd.to_datetime()
if your date is not in the datetime format.
Upvotes: 1