Reputation: 3322
I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct impala that results of this CTE should be re-used in main query as-is instead of SCAN HDFS operation on tables involved in CTE again with the main query. Is this possible? if yes how ?
I am using impalad version 2.1.1-cdh5 RELEASE (build 7901877736e29716147c4804b0841afc4ebc9037) version
Upvotes: 6
Views: 515
Reputation: 1659
Is this possible?
The very purpose of the CTE is to re-use of the results obtained from a preceding query (that uses the with
clause) by a following query, say, SELECT
. So i don't see a reason why it is not possible.
Use Explain
on your query to find out the actual SCAN HDFS details.
For more I/O related insights use profile
as in the official documentation https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_explain_plan.html#perf_profile
Upvotes: 2
Reputation: 1944
I do not think so. I believe WITH clause does not create any permanent object, it's just for you avoid cluttering the namespace with new tables or views and to make it easier to refactor large, complex queries by reor‐ dering and replacing their individual parts. The queries used in the WITH clause are good candidates to someday become views or to be materialized as summary tables during the ETL process.
Upvotes: 3