gontxomde
gontxomde

Reputation: 152

Can I duplicate an iceberg table duplicating s3 files?

I am relatively new to using Iceberg. Our workflow is: we create an iceberg table in lakeformation and we ingest data from a SQL DB to those tables. The Iceberg tables use s3 as storage for data and metadata.

Now we want to copy the data to another environment to do other tests. The question is if I can duplicate the data by duplicating all the s3 files relative to that iceberg table. This way I could avoid re-extracting all the data from the origin RDBMS

Upvotes: 0

Views: 524

Answers (1)

Tushar Choudhary
Tushar Choudhary

Reputation: 1

The solution depends on 'what' the 'other' environment is and how you extract and load data.

If you have mirrored the data from the source: RDBMS into your sink: Iceberg on S3, you should ideally never have to query the source database again.

An Iceberg table stores all snapshots of the table, it would become costly to copy all those versions of the data and metadata files.

reference

The best practice is to copy a snapshot of data for experimentation, which you can do with any querying engine like spark or trino etc.

Upvotes: 0

Related Questions