Reputation: 324
I am converting my table into parquet file format using Azure Data Factory. Performing query on parquet file using databricks for reporting. I want to update only existing records which are updated in original sql server table. Since I am performing it on very big table and daily I don't want to perform truncate and reload entire table as it will be costly.
Is there any way I can update those parquet file without performing truncate and reload operation.
Upvotes: 1
Views: 5395
Reputation: 1
Always go for soft Delete while working in No-Sql. Hard delete if very costly.
Also, with soft-Delete, down stream pipeline can consume the update and act upon it.
Upvotes: -1
Reputation: 324
I have found a workaround to this problem.
Upvotes: 1
Reputation: 87164
Parquet is by default immutable, so only way to rewrite the data is to rewrite the table. But that is possible to do if you switch to use of Delta file format that supports updating/deleting the entries, and is also supports MERGE operation.
You can still use Parquet format for production of the data, but then you need to use that data to update the Delta table.
Upvotes: 2