dovregubben
dovregubben

Reputation: 454

Snowflake - how to read metadata from parquet files in S3

We are using external tables in our Snowflake database, in order to read data from some AWS S3 buckets. The buckets contain various parquet files, spread over multiple partitions.

We are able to read the data from our external table by using Snowflake's stages, storage integrations and file formats.

However, we'd like to read some metadata from the parquet files as well, such as the precision of numeric data types (e.g., to find out how many decimal places we have to deal with).

To keep it simple, let's say we're reading data from one single parquet file.

Is there any way to retrieve metadata from that parquet file as to the precision of numeric data types, directly from Snowflake?

Or would you rather extract that metadata from, let's say, Glue Catalog or any other external tool?

Upvotes: 4

Views: 1683

Answers (1)

Greg Pavlik
Greg Pavlik

Reputation: 11046

There's a recent public preview that infers schema that will do this:

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  , FILE_FORMAT => '<format_name>'
)

https://docs.snowflake.com/en/sql-reference/functions/infer_schema.html

Upvotes: 5

Related Questions