vvg
vvg

Reputation: 6385

Compute HIVE statistics in Apache Spark

I'm trying to compute HIVE table statistic from Apache Spark:

`sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS')`

I also execute statement to see what was collected:

sqlCtx.sql('DESC FORMATTED t1')

I can see my stats was collected. However when I execute same staement in HIVE client (Ambari) - there are no statistics displayed. Is it available only to Spark if it's collected by Spark? Does spark store it somewhere else?

Another question.

I also computing stats for all columns in that table:

sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS FOR COLUMNS c1,c2')

But when I want to see this stats in spark, it failed with unsupported sql statement exception:

sqlCtx.sql('DESC FORMATTED t1 c1')

According to docs it's valid hive queries. What is wrong with it?

Thanks for help.

Upvotes: 1

Views: 1732

Answers (2)

starqiu
starqiu

Reputation: 1

just uppercase the name of table will be ok.

select param_key, param_value 
from TABLE_PARAMS tp, TBLS t 
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>' 
and param_key like 'spark.sql.stat%';

Upvotes: 0

vvg
vvg

Reputation: 6385

Apache Spark stores statistics as "Table parameters". To be able retrieve these stats, we need to connect to HIVE metastore and . execute query like following

select param_key, param_value 
from table_params tp, tbls t 
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>' 
and param_key like 'spark.sql.stat%';

Upvotes: 2

Related Questions