Mohan
Mohan

Reputation: 4829

NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column

I have a pandas dataframe that has a timedelta column.

df['dep_time'] = pd.to_timedelta(df.loc[:, 'dep_time'])

dataframe.dtypes shows this column as:

dep_time       timedelta64[ns]

Next I save this dataframe into a parquet file using

df.to_parquet('parquet_file.parquet', engine='fastparquet', index=False)

When I inspect the parquet file using parquet-tools command line utility, the column type is shown as :

############ Column(dep_time) ############
name: dep_time
path: dep_time
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Time(isAdjustedToUTC=true, timeUnit=microseconds)
converted_type (legacy): TIME_MICROS
compression: SNAPPY (space_saved: 94%)

I upload this parquet file to an S3 location and now I need to query data from this file using AWS Athena.

I have created a table IN Athena where the dtype for this column was set as timestamp, however when querying the athena table, I keep getting this error.

NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column ([dep_time] optional int64 dep_time (TIME(MICROS,true)))

My best guess is that timestamp is not the correct dtype in athena for a timedelta column, but I am not sure what is.

Do I need to change the dtype in the athena column, if yes to what ? I have tried setting it to string, int, (praying to GOD) NOTHING WORKS :( !!!

PLEASE HELP.

Upvotes: 1

Views: 2007

Answers (0)

Related Questions