rainingdistros
rainingdistros

Reputation: 637

How to access iceberg tables created in GCP eco-system from within Snowflake and external ETL Tools

We have files in GCS which are planned for ETL via Dataproc/DataFlow. The idea is to create iceberg tables and have them accessible from within Snowflake. We will also need to access the iceberg tables using ETL tools (JDBC/ODBC) and code outside of GCP and Snowflake. Please note that in most cases, the iceberg tables will be accessed by table name and not by GCS path (atleast in GUI tools). Transformations are from within Dataproc and not possible to be performed from within snowflake - so snowpipe is out. Also secondary aim is to reduce snowflake costs for this scenario.

What is the best way to achieve this?

  1. Is it possible to connect to and use the inbuilt Snowflake catalog from within Dataproc ?

  2. Is the Dataproc metastore service from within Dataproc a possibility? if yes how can they be accessed from within Snowflake and other tools?

  3. I have been through the external catalog page in Snowflake, but it is not very clear regarding the options (pertaining to me only). Can tables created in external systems be registered into Snowflake Catalog? is Open Catalog another service or the same as the OSS polaris catalog?

I hope the question does not get downvoted as asking for opinions or for not having enough information.

Thank you for all the help.

Cheers

Upvotes: 1

Views: 90

Answers (0)

Related Questions