Apache-Drill doesn't understand Pandas datetime64[ns]

Question

I'm using Pyarrow, Pyarrow.Parquet as well as Pandas. When I send a Pandas datetime64[ns] series to a Parquet file and load it again via a drill query, the query shows an Integer like: 1467331200000000 which seems to be something else than a UNIX timestamp.

The query looks like this:

SELECT workspace.id-column AS id-column, workspace.date-column AS date-column

When I open that file within Python again, it loads correctly and still has its datetime64[ns] type.

Any idea what's going wrong and how to solve this? I want this value being shown as a regular date.

Christian · Accepted Answer

Ok, I found a solution some days ago which I would like to share. I think I initially missed something. It's very important to downcast to [ms] as well as allowing truncating timestamps before sending the dataframe to Parquet for becoming able to open it issue free in Drill:

pq.write_table(table, rf'{name}.parquet',
           coerce_timestamps='ms',
           allow_truncated_timestamps=True)

When I define a view in Drill I can cast that column as date or timestamp as required.

Apache-Drill doesn't understand Pandas datetime64[ns]

Answers (2)

Related Questions

Apache-Drill doesn&#39;t understand Pandas datetime64[ns]

Answers (2)

Related Questions

Apache-Drill doesn't understand Pandas datetime64[ns]