Reputation: 6385
I'm trying to compute HIVE table statistic from Apache Spark:
`sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS')`
I also execute statement to see what was collected:
sqlCtx.sql('DESC FORMATTED t1')
I can see my stats was collected. However when I execute same staement in HIVE client (Ambari) - there are no statistics displayed. Is it available only to Spark if it's collected by Spark? Does spark store it somewhere else?
Another question.
I also computing stats for all columns in that table:
sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS FOR COLUMNS c1,c2')
But when I want to see this stats in spark, it failed with unsupported sql statement exception:
sqlCtx.sql('DESC FORMATTED t1 c1')
According to docs it's valid hive queries. What is wrong with it?
Thanks for help.
Upvotes: 1
Views: 1732
Reputation: 1
just uppercase the name of table will be ok.
select param_key, param_value
from TABLE_PARAMS tp, TBLS t
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>'
and param_key like 'spark.sql.stat%';
Upvotes: 0
Reputation: 6385
Apache Spark stores statistics as "Table parameters". To be able retrieve these stats, we need to connect to HIVE metastore and . execute query like following
select param_key, param_value
from table_params tp, tbls t
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>'
and param_key like 'spark.sql.stat%';
Upvotes: 2