Mike Williamson
Mike Williamson

Reputation: 3218

How to prevent Pandas from converting datetimes to datetime64

Need

I am trying to export a dataframe to a Parquet file, which will be consumed later in the pipeline by something that is not Python or Pandas. (Azure Data Factory)

When I ingest the Parquet file later in the flow, it cannot recognize datetime64[ns]. I would rather just use "vanilla" Python datetime.datetime.

Problem

But I cannot manage to do this. The problem is that Pandas is forcing any "datetime-like object into datetime64[ns] once it is back in a dataframe or series.

Small Example

For instance, assume the iris dataset with a "timestamp" column:

>>> df.head()
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)   class                  timestamp
0                5.1               3.5                1.4               0.2  setosa 2021-02-19 15:07:24.719272
1                4.9               3.0                1.4               0.2  setosa 2021-02-19 15:07:24.719272
2                4.7               3.2                1.3               0.2  setosa 2021-02-19 15:07:24.719272
3                4.6               3.1                1.5               0.2  setosa 2021-02-19 15:07:24.719272
4                5.0               3.6                1.4               0.2  setosa 2021-02-19 15:07:24.719272

>>> df.dtypes
sepal length (cm)           float64
sepal width (cm)            float64
petal length (cm)           float64
petal width (cm)            float64
class                      category
timestamp            datetime64[ns]
dtype: object

I can convert a value to a "normal Python datetime":

>>> df.timestamp[1]
Timestamp('2021-02-19 15:07:24.719272')
>>> type(df.timestamp[1])
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

>>> df.timestamp[1].to_pydatetime()
datetime.datetime(2021, 2, 19, 15, 7, 24, 719272)
>>> type(df.timestamp[1].to_pydatetime())
<class 'datetime.datetime'>

But I cannot "keep" it in that type, when I convert the entire column / series:

>>> df['ts2'] = df.timestamp.apply(lambda x: x.to_pydatetime())
>>> df.dtypes
sepal length (cm)           float64
sepal width (cm)            float64
petal length (cm)           float64
petal width (cm)            float64
class                      category
timestamp            datetime64[ns]
ts2                  datetime64[ns]

Possible Solutions

I looked to see if there were anything I could do to "dumb down" the dataframe column and make its datetimes less precise. But I cannot see anything. Nor can I see an option to specify column data types upon export via the df.to_parquet() method.

Is there a way to create a plain Python datetime.datetime column (not the Numpy/Pandas datetime65[ns] column) in a Pandas dataframe?

Upvotes: 3

Views: 1741

Answers (2)

Yudi Guzm&#225;n
Yudi Guzm&#225;n

Reputation: 71

In my case, when I tried to convert datetime64[ns] to datetime, I used the function dt.date and got an object data and not precisely a date data, but it worked:

df[added_column_name] = pd.to_datetime(df['column_name']).dt.date

dfhead()

Now, 'added_column_name' is an object data.

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

Try to force the dtype='object' when you use to_pydatetime:

df['ts'] = pd.Series(df.timestamp.dt.to_pydatetime(),dtype='object')

df.loc[0,'ts']

Output:

datetime.datetime(2021, 2, 19, 15, 7, 24, 719272)

Upvotes: 3

Related Questions