Brian
Brian

Reputation: 13603

Removing data in a pandas DataFrame

I have a DataFrame that contains the price of a ticker like this:

date                    price
2006-01-02 09:00:00     6548.34
2006-01-02 09:01:00     6457.61
2006-01-02 09:02:00     6452.82
2006-01-02 09:03:00     6452.39
2006-01-02 09:04:00     6451.61
                         ...
2021-01-29 13:29:40    15229.54
2021-01-29 13:29:45    15229.54
2021-01-29 13:29:50    15229.54
2021-01-29 13:29:55    15229.54
2021-01-29 13:30:00    15138.31

How can I remove the hour, minute, and second parts in the date?

I mean I want something like this.

date          price
2006-01-02    6577.33
2006-01-03    6488.62
2006-01-04    6469.82
2006-01-05    6555.39
2006-01-06    6448.65
                         ...
2021-01-25    15229.53
2021-01-26    15129.34
2021-01-27    14129.64
2021-01-28    15228.34
2021-01-29    15138.31

I only want the last price of each day.

So the thing I want to do is to remove all the other prices of each day.

How can I achieve this?

My df.index returns the following

>>> df.index
DatetimeIndex(['2006-01-02 09:00:00', '2006-01-02 09:01:00',
               '2006-01-02 09:02:00', '2006-01-02 09:03:00',
               '2006-01-02 09:04:00', '2006-01-02 09:05:00',
               '2006-01-02 09:06:00', '2006-01-02 09:07:00',
               '2006-01-02 09:08:00', '2006-01-02 09:09:00',
               ...
               '2021-01-29 13:29:15', '2021-01-29 13:29:20',
               '2021-01-29 13:29:25', '2021-01-29 13:29:30',
               '2021-01-29 13:29:35', '2021-01-29 13:29:40',
               '2021-01-29 13:29:45', '2021-01-29 13:29:50',
               '2021-01-29 13:29:55', '2021-01-29 13:30:00'],
              dtype='datetime64[ns]', name='date', length=6345485, freq=None)

df.columns returns

>>> df.columns
Index(['price'], dtype='object', name='stock_id')

My pd.to_datetime(df.index, format='%Y%m%d') returns

>>> pd.to_datetime(df.index)
DatetimeIndex(['2006-01-02 09:00:00', '2006-01-02 09:01:00',
               '2006-01-02 09:02:00', '2006-01-02 09:03:00',
               '2006-01-02 09:04:00', '2006-01-02 09:05:00',
               '2006-01-02 09:06:00', '2006-01-02 09:07:00',
               '2006-01-02 09:08:00', '2006-01-02 09:09:00',
               ...
               '2021-01-29 13:29:15', '2021-01-29 13:29:20',
               '2021-01-29 13:29:25', '2021-01-29 13:29:30',
               '2021-01-29 13:29:35', '2021-01-29 13:29:40',
               '2021-01-29 13:29:45', '2021-01-29 13:29:50',
               '2021-01-29 13:29:55', '2021-01-29 13:30:00'],
              dtype='datetime64[ns]', name='date', length=6345485, freq=None)

This is so strange. Because the to_datetime() conversion doesn't seem to work.

Upvotes: 0

Views: 50

Answers (2)

Brian
Brian

Reputation: 13603

Thanks to XavierBrt's answer, I've figured out the conversion for the format of my DataFrame.

df.index = df.index.normalize()
df = df.groupby(df.index).last('price')

Upvotes: 0

XavierBrt
XavierBrt

Reputation: 1249

You can keep only the date of the datetime, and then group by the date and keeping the last value of each:

df["date"] = pd.to_datetime(df["date"]).dt.normalize()
df = df.groupby("date").last("price")

I added pd.to_datetime() if your date is not in the datetime format.

Upvotes: 1

Related Questions