user3062260
user3062260

Reputation: 1644

Pandas timestamp converts to integers when its forced into a unique list - possible bug?

Is this the expected behaviour of pandas, I expected unique timestamps to be the output, I appreciate that they are integers which can be converted to timestamps but they are not timestamps:

import pandas as pd
df = pd.DataFrame()
df['last_test_data'] = ['2016-12-16', '2016-12-16', '2016-12-18', '2016-12-18', '2016-12-31']
df['last_test_data'] = pd.to_datetime(df['last_test_data'], format="%Y-%m-%d")
df = df.sort_values('last_test_data')

print(df['last_test_data'])

0   2016-12-16
1   2016-12-16
2   2016-12-18
3   2016-12-18
4   2016-12-31
Name: last_test_data, dtype: datetime64[ns]


OS_dates = df['last_test_data'].unique().tolist()    
print(OS_dates)

[1481846400000000000, 1482019200000000000, 1483142400000000000]

The .unique().tolist() seems to alter the timestamp to a list which means I cannot use timestamp methods on it such as:

for date in dateList:
    print(date.month)

It can be reconverted to a timestamp with:

dateList = [pd.to_datetime(d) for d in dateList]

But this is an extra step. I am using python 3.7.7 and pandas 1.0.5 (please not I cannot upgrade to the latest version without a lot of hassle as my workflow runs on a number of other systems)

Upvotes: 2

Views: 430

Answers (1)

jlb_gouveia
jlb_gouveia

Reputation: 603

When using the .tolist(), the data items will be converted to the nearest compatible builtin Python type: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html

Also, using the .unique(), I can see the datatype for each element in the list changes to numpy.datetime64, which won't respond to .month

When creating the list, you may use the code below:

OS_dates = list(pd.to_datetime(df['last_test_data'].unique()))    

Upvotes: 3

Related Questions