Reputation: 1197
Is there a way to identify, total number of jobs required to execute a query.
For Example in the below 2 queries, number of joins and subquery are same but one query would require 2 jobs where as other requires 3
select t1.item_dim_key hive, t2.item_dim_key as monet
from ext_dist_it_dim_key t1
left outer join (select distinct item_dim_key from PO_ITEM_DIM) t2 on t1.item_dim_key=t2.item_dim_key
where t2.item_dim_key is null;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = 20190208020329_258ee4c0-5819-4842-b479-d549c82a0779
**Total jobs = 3**
hive> select t1.item_dim_key hive, t2.item_dim_key as monet
from (select distinct item_dim_key from PO_ITEM_DIM) t1
left outer join ext_dist_it_dim_key t2 on t1.item_dim_key=t2.item_dim_key
where t2.item_dim_key is null;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = 20190208020624_9ea3dc20-ffc8-4461-9516-7a4770d1dd6b
**Total jobs = 2**
Is it possible to know how many jobs would it take for a query to execute? What are the parameters required to calculate number of jobs.
Thanks
Upvotes: 0
Views: 424