Reputation: 126
Problem statement :- I have a original external table with table count(1000) by copying its underlying data to some temp location and when created backup table pointing to that temp location. And after running the msck repair the both table counts are not matching?
Is there any reason for it. Could you please help me in understanding the reason behind it .
Upvotes: 1
Views: 1246
Reputation: 858
Answering and clarifying few things here,
Stats can be fetched either directly from Metastore or by reading through the underlying data. It can be controlled by the property hive.compute.query.using.stats
a. When it is set to TRUE, Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the metastore.
b. When it is set to FALSE, Hive will spawn a YARN job to read through the data and provide the count results. It is usually time consuming based on the amount of data since this is not a direct fetch from the statistics stored in Hive Metastore.
So, if we want the correct statistics to be returned in the results when the property hive.compute.query.using.stats
is set to TRUE, we need to make sure the statistics for the table is updated.
You can check if the value is set to TRUE or FALSE by running the below in Hive,
SET hive.compute.query.using.stats;
MSCK REPAIR does not do the file level checks. It looks only for directory level changes, for example if you have created a partitioned table and added a partition directory manually in HDFS and if you want Hive to be aware of it, MSCK REPAIR would serve the purpose.
Upvotes: 1