Reputation: 3218

How to prevent Pandas from converting datetimes to datetime64

Need

I am trying to export a dataframe to a Parquet file, which will be consumed later in the pipeline by something that is not Python or Pandas. (Azure Data Factory)

When I ingest the Parquet file later in the flow, it cannot recognize datetime64[ns]. I would rather just use "vanilla" Python datetime.datetime.

Problem

But I cannot manage to do this. The problem is that Pandas is forcing any "datetime-like object into datetime64[ns] once it is back in a dataframe or series.

Small Example

For instance, assume the iris dataset with a "timestamp" column:

>>> df.head()
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)   class                  timestamp
0                5.1               3.5                1.4               0.2  setosa 2021-02-19 15:07:24.719272
1                4.9               3.0                1.4               0.2  setosa 2021-02-19 15:07:24.719272
2                4.7               3.2                1.3               0.2  setosa 2021-02-19 15:07:24.719272
3                4.6               3.1                1.5               0.2  setosa 2021-02-19 15:07:24.719272
4                5.0               3.6                1.4               0.2  setosa 2021-02-19 15:07:24.719272

>>> df.dtypes
sepal length (cm)           float64
sepal width (cm)            float64
petal length (cm)           float64
petal width (cm)            float64
class                      category
timestamp            datetime64[ns]
dtype: object

I can convert a value to a "normal Python datetime":

>>> df.timestamp[1]
Timestamp('2021-02-19 15:07:24.719272')
>>> type(df.timestamp[1])
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

>>> df.timestamp[1].to_pydatetime()
datetime.datetime(2021, 2, 19, 15, 7, 24, 719272)
>>> type(df.timestamp[1].to_pydatetime())
<class 'datetime.datetime'>

But I cannot "keep" it in that type, when I convert the entire column / series:

>>> df['ts2'] = df.timestamp.apply(lambda x: x.to_pydatetime())
>>> df.dtypes
sepal length (cm)           float64
sepal width (cm)            float64
petal length (cm)           float64
petal width (cm)            float64
class                      category
timestamp            datetime64[ns]
ts2                  datetime64[ns]

Possible Solutions

I looked to see if there were anything I could do to "dumb down" the dataframe column and make its datetimes less precise. But I cannot see anything. Nor can I see an option to specify column data types upon export via the df.to_parquet() method.

Is there a way to create a plain Python datetime.datetime column (not the Numpy/Pandas datetime65[ns] column) in a Pandas dataframe?

Upvotes: 3

How to prevent Pandas from converting datetimes to datetime64

Need

Problem

Small Example

Possible Solutions

Answers (2)

Related Questions