Reputation: 105
I would like to return a list of all unique dates - so reduced to days from the index. I am having trouble because the Date column is set as the index of the DataFrame and I want to do it without having to reset the index.
This is a sample of my DataFrame.
Input:
Open High Low Close Volume
Date
2022-03-31 09:30:00 177.3628 177.5123 176.5451 176.5850 10809
2022-03-31 09:35:00 176.5750 176.6448 176.1163 176.1662 2647
2022-03-31 09:40:00 176.1762 176.1861 175.3984 175.7274 2540
2022-03-31 09:45:00 175.7274 175.7274 175.1192 175.3286 3495
2022-03-31 09:50:00 175.3186 175.6576 175.2987 175.5280 1904
... ... ... ... ... ...
2022-09-21 15:35:00 155.9000 155.9400 154.8400 154.9300 3131
2022-09-21 15:40:00 154.9000 155.2300 154.6000 154.6000 3181
2022-09-21 15:45:00 154.5700 154.6000 154.1300 154.3700 3999
2022-09-21 15:50:00 154.3200 154.8100 154.2400 154.5100 4606
2022-09-21 15:55:00 154.5000 154.5000 153.6700 153.7100 10145
Desired output:
Dates = ["2022-03-31, 2022,04-01, 2022-04-02, ...]
Upvotes: 2
Views: 1774
Reputation: 13582
Considering that the dataframe looks like the following
df = pd.DataFrame({'Open': [177.3628, 176.5750, 176.1762, 175.7274, 175.3186, 155.9000, 154.9000, 154.5700, 154.3200, 154.5000],
'High': [177.5123, 176.6448, 176.1861, 175.7274, 175.6576, 155.9400, 155.2300, 154.6000, 154.8100, 154.5000],
'Low': [176.5451, 176.1163, 175.3984, 175.1192, 175.2987, 154.8400, 154.6000, 154.1300, 154.2400, 153.6700],
'Close': [176.5850, 176.1662, 175.7274, 175.3286, 175.5280, 154.9300, 154.6000, 154.3700, 154.5100, 153.7100],
'Volume': [10809, 2647, 2540, 3495, 1904, 3131, 3181, 3999, 4606, 10145]},
index=pd.to_datetime(['2022-03-31 09:30:00', '2022-03-31 09:35:00', '2022-03-31 09:40:00', '2022-03-31 09:45:00', '2022-03-31 09:50:00', '2022-09-21 15:35:00', '2022-09-21 15:40:00', '2022-09-21 15:45:00', '2022-09-21 15:50:00', '2022-09-21 15:55:00']))
[Out]:
Open High Low Close Volume
2022-03-31 09:30:00 177.3628 177.5123 176.5451 176.5850 10809
2022-03-31 09:35:00 176.5750 176.6448 176.1163 176.1662 2647
2022-03-31 09:40:00 176.1762 176.1861 175.3984 175.7274 2540
2022-03-31 09:45:00 175.7274 175.7274 175.1192 175.3286 3495
2022-03-31 09:50:00 175.3186 175.6576 175.2987 175.5280 1904
2022-09-21 15:35:00 155.9000 155.9400 154.8400 154.9300 3131
2022-09-21 15:40:00 154.9000 155.2300 154.6000 154.6000 3181
2022-09-21 15:45:00 154.5700 154.6000 154.1300 154.3700 3999
2022-09-21 15:50:00 154.3200 154.8100 154.2400 154.5100 4606
2022-09-21 15:55:00 154.5000 154.5000 153.6700 153.7100 10145
In this specific case, which I assume is the same as OP's, the dates are of type datetime64[ns]
(if note, see notes below) and are the index of the dataframe.
Therefore, one option would be using a list and a set
(unordered collection with no duplicate elements) as follows
Dates = list(set(df.index.date.tolist()))
[Out]: [datetime.date(2022, 9, 21), datetime.date(2022, 3, 31)]
Then, if one wants to obtain the output as a list of strings, one can do the following
Dates = [str(date) for date in Dates]
[Out]: ['2022-09-21', '2022-03-31']
A one-liner to obtain the desired output would be like this (making the list is not required here)
Dates = [str(date) for date in set(df.index.date.tolist())]
[Out]: ['2022-03-31', '2022-09-21']
Even though the dates are sorted in this specific case, if they aren't and one wants the dates sorted, then one can use sorted()
as follows
Dates = [str(date) for date in sorted(set(df.index.date.tolist()))]
[Out]: ['2022-03-31', '2022-09-21']
Option 2
Even though OP doesn't want to use additional libraries, instead of list()
and set()
(and sorted()
), one can use numpy.unique
. This method returns a sorted unique elements of an array. It would be as follows
import numpy as np
Dates = [str(date) for date in np.unique(df.index.date).tolist()]
[Out]: ['2022-03-31', '2022-09-21']
Notes:
If the index is not of datetime, one can do the following
df.index = pd.to_datetime(df.index)
If one wants to specify different date formats, such as minutes, one would have to use pandas.DatetimeIndex.strftime
with specific python string formats as follows
Dates = [str(date) for date in sorted(set(df.index.strftime('%Y-%m-%d %H:%M').tolist()))]
[Out]: ['2022-03-31 09:30', '2022-03-31 09:35', '2022-03-31 09:40', '2022-03-31 09:45', '2022-03-31 09:50', '2022-09-21 15:35', '2022-09-21 15:40', '2022-09-21 15:45', '2022-09-21 15:50', '2022-09-21 15:55']
Upvotes: 1
Reputation: 248
There are quite a few ways to solve this, technically the easiest is to reset the index and do l = list(df.reset_index().Date.apply(lambda x:x.date()).unique())
You do not have to modify the dataframe object to do this.
You could also just do something like:
l = []
for x in df.index:
if x not in l: l.append(x)
Here is an example:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.zeros((10,10)))
list(df.reset_index()[0].apply(lambda x:x).unique())
Upvotes: 0