Reputation: 605
I'm using Pyarrow, Pyarrow.Parquet as well as Pandas. When I send a Pandas datetime64[ns]
series to a Parquet file and load it again via a drill query, the query shows an Integer like: 1467331200000000 which seems to be something else than a UNIX timestamp.
The query looks like this:
SELECT workspace.id-column AS id-column, workspace.date-column AS date-column
When I open that file within Python again, it loads correctly and still has its datetime64[ns]
type.
Any idea what's going wrong and how to solve this? I want this value being shown as a regular date.
Upvotes: 3
Views: 598
Reputation: 605
Ok, I found a solution some days ago which I would like to share. I think I initially missed something. It's very important to downcast to [ms] as well as allowing truncating timestamps before sending the dataframe to Parquet for becoming able to open it issue free in Drill:
pq.write_table(table, rf'{name}.parquet',
coerce_timestamps='ms',
allow_truncated_timestamps=True)
When I define a view in Drill I can cast that column as date or timestamp as required.
Upvotes: 3
Reputation: 661
Could you please share parquet schema? Is it contains TIMESTAMP logical type for this column?
For now, you can write your custom UDF which converts BigInt nanoseconds value to timestamp, or use built-in functions (not sure about ns, but there are a lot of functions which accept millis):
select to_timestamp(1467331200000/1000);
+-----------------------+
| EXPR$0 |
+-----------------------+
| 2016-07-01 03:00:00.0 |
+-----------------------+
Upvotes: 0