Reputation: 1644
Is this the expected behaviour of pandas, I expected unique timestamps to be the output, I appreciate that they are integers which can be converted to timestamps but they are not timestamps:
import pandas as pd
df = pd.DataFrame()
df['last_test_data'] = ['2016-12-16', '2016-12-16', '2016-12-18', '2016-12-18', '2016-12-31']
df['last_test_data'] = pd.to_datetime(df['last_test_data'], format="%Y-%m-%d")
df = df.sort_values('last_test_data')
print(df['last_test_data'])
0 2016-12-16
1 2016-12-16
2 2016-12-18
3 2016-12-18
4 2016-12-31
Name: last_test_data, dtype: datetime64[ns]
OS_dates = df['last_test_data'].unique().tolist()
print(OS_dates)
[1481846400000000000, 1482019200000000000, 1483142400000000000]
The .unique().tolist() seems to alter the timestamp to a list which means I cannot use timestamp methods on it such as:
for date in dateList:
print(date.month)
It can be reconverted to a timestamp with:
dateList = [pd.to_datetime(d) for d in dateList]
But this is an extra step. I am using python 3.7.7 and pandas 1.0.5 (please not I cannot upgrade to the latest version without a lot of hassle as my workflow runs on a number of other systems)
Upvotes: 2
Views: 430
Reputation: 603
When using the .tolist()
, the data items will be converted to the nearest compatible builtin Python type: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html
Also, using the .unique()
, I can see the datatype for each element in the list changes to numpy.datetime64, which won't respond to .month
When creating the list, you may use the code below:
OS_dates = list(pd.to_datetime(df['last_test_data'].unique()))
Upvotes: 3