Reputation: 45722
I want to create Databricks global unmanaged tables from ADLS data and use them from multiple clusters (automated and interactive). So I'm doing CREATE TABLE my_table ...
first, then MSCK REPAIR TABLE my_table
. I'm using Databricks internal Hive metastore.
Sometimes MSCK REPAIR
wasn't synced across clusters (at all, for hours). Means cluster #1 saw partitions immediately, while cluster #2 didn't see any data for some time.
Sometimes it's synced, still I can't understand why it doesn't work in other cases.
Does Databricks use separate internal hive metastore per cluster? If yes, are there any guarantees about sync-up between clusters?
Upvotes: 0
Views: 401
Reputation: 176
I believe each databricks deployment has a single hive metastore: https://docs.databricks.com/data/metastores/index.html.
So if the metastore is being updated immediately, then the next most likely problem is that the old table metadata is being cached, so you aren't seeing the updates. Have you tried running
REFRESH <database>.<table>;
on the cluster that was having the sync issues?
Upvotes: 1