Reputation: 3696
I am using Hive version 1.2.1. If I run select count(*) from mytable
, I see that it launches a Tez job. So obviously it is not using any table statistics information because ideally number of rows should be fetched from table statistics stored in the Hive Metastore. Also I explicitly checked all the tables in the Hive Metastore, and I didn't find any table name there, which would sort of suggest that it stores table statistics. The only next best relevant table that I could see was TAB_COL_STATS
, but this table only stores column level statistics and that too it only had very handful number of rows, the table had 10 rows. This poses two questions.
TAB_COL_STATS
stores everything, why any column like num_rows is not part of this table schema? I see max,min,avg,num_distinct etc type of columns.Upvotes: 2
Views: 1472
Reputation: 553
TABLE_PARAMS
or PARTITION_PARAMS
.You should set hive.compute.query.using.stats
to true
to make use of metadata for queries like select count(*)...
.
But beforehand, make sure those statistics actually exists.
If not, run analyze table mytable compute statistics
to gather it first.
Or you can set hive.stats.autogather
to true
to enforce the gathering when data is inserted into tables.
Upvotes: 2