Reputation: 4829
I have a pandas dataframe that has a timedelta column.
df['dep_time'] = pd.to_timedelta(df.loc[:, 'dep_time'])
dataframe.dtypes
shows this column as:
dep_time timedelta64[ns]
Next I save this dataframe into a parquet file using
df.to_parquet('parquet_file.parquet', engine='fastparquet', index=False)
When I inspect the parquet file using parquet-tools command line utility, the column type is shown as :
############ Column(dep_time) ############
name: dep_time
path: dep_time
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Time(isAdjustedToUTC=true, timeUnit=microseconds)
converted_type (legacy): TIME_MICROS
compression: SNAPPY (space_saved: 94%)
I upload this parquet file to an S3 location and now I need to query data from this file using AWS Athena.
I have created a table IN Athena where the dtype for this column was set as timestamp, however when querying the athena table, I keep getting this error.
NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column ([dep_time] optional int64 dep_time (TIME(MICROS,true)))
My best guess is that timestamp is not the correct dtype in athena for a timedelta column, but I am not sure what is.
Do I need to change the dtype in the athena column, if yes to what ? I have tried setting it to string, int, (praying to GOD) NOTHING WORKS :( !!!
PLEASE HELP.
Upvotes: 1
Views: 2007