Reputation: 439
I am trying to translate some PL/SQL script in hive, and i faced an error with one HiveQL script.
The error is this one :
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected.
I think that the error is coming from this part of script :
SELECT
mag.co_magasin,
dem.id_produit as id_produit_orig,
pnvente.dt_debut_commercial as dt_debut_commercial,
COALESCE(pnvente.id_produit,dem.id_produit) as id_produit,
min(
CASE WHEN dem.co_validation IS NULL THEN 0 ELSE 1 END
) as flg_demarque_valide,
sum(CASE WHEN dem.co_validation IS NULL THEN 0 ELSE cast(dem.mt_revient_ope AS INT) END)
as me_dem_con_prx_cs,
0 as me_dem_inc_prx_cs,
0 as me_dem_prov_stk_cs,
sum(CASE WHEN dem.co_validation IS NULL THEN 0 ELSE cast(dem.qt_demarque AS INT) END)
as qt_dem_con,
0 as qt_dem_inc,
0 as qt_dem_prov_stk,
RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC, COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
from default.calendrier cal
INNER JOIN default.demarque_mag_jour dem
ON CASE WHEN dem.co_societe = 1 THEN 1 ELSE 2 END = '${hiveconf:in_co_societe}'
AND dem.dt_jour = cal.dt_jour
LEFT OUTER JOIN default.produit_norm pn
ON pn.co_societe = dem.co_societe
AND pn.id_produit = dem.id_produit
LEFT OUTER JOIN default.produit_norm pnvente
ON pnvente.co_societe = pn.co_societe
AND pnvente.co_produit_rfu = pn.co_produit_lip
AND pnvente.co_type_motif='05'
INNER JOIN default.kpi_magasin mag
ON mag.co_societe = '${hiveconf:in_co_societe}'
AND mag.id_magasin = dem.id_magasin
WHERE cal.dt_jour = '${hiveconf:in_dt_jour}'
AND NOT (dem.co_validation IS NULL AND cal.dt_jour > from_unixtime(unix_timestamp()-3*60*60*24, 'ddmmyyyy'))
-- JYP 4.4
AND dem.co_operation_magasin IN ('13','14','32')
GROUP BY
mag.co_magasin,
dem.id_produit,
pnvente.dt_debut_commercial,
COALESCE(pnvente.id_produit,dem.id_produit)
But i can't find any solution on the web.
Thanks for your help :-)
Upvotes: 3
Views: 8459
Reputation: 12801
I have run into the same error. rank()
is case sensitive in hive and the error message give nothing away. Try changing RANK()
to rank()
.
Upvotes: 4
Reputation: 5432
Funny enough, I actually hit this same error today. The problem for me was that one of the columns I was using in my analytic function was not a valid column. W/O knowing what columns your tables provide its impossible for me to prove this is your problem, but you may want to make sure all the columns in your RANK are valid.
Upvotes: 1
Reputation: 5432
My guess is that it has to do with the coalesce inside your rank. Analytic functions work but are more limited in HiveQL. I would try all your joins and sums in an inner query and then do the rank in an outer query. Often times this is required as HiveQL does not always follow the same order of operations you would expect from a typical SQL language. Consider a table based on stock information:
select count(*) as COUNT
from NYSE_STOCKS
where date in ('2001-12-20','2001-12-21','2001-12-24') and exchange = 'NYSE';
Now consider the following query:
select
exchange
, date
, count(*) over (partition by exchange)
from NYSE_STOCKS
where date in ('2001-12-20','2001-12-21','2001-12-24')
group by exchange, date;
You would expect the following results:
EXCHANGE | DATE | COUNT
NYSE | 2001-12-20 | 5199
NYSE | 2001-12-21 | 5199
NYSE | 2001-12-24 | 5199
But you would actually get this in HiveQL:
EXCHANGE | DATE | COUNT
NYSE | 2001-12-20 | 3
NYSE | 2001-12-21 | 3
NYSE | 2001-12-24 | 3
To get the correct results you have to do the group by in an inner query and the analytic function in the outer query:
select
exchange
, date
, count
from (
select
exchange
, date
, count(*) over (partition by exchange) as count
from NYSE_STOCKS
where date in ('2001-12-20','2001-12-21','2001-12-24')
) A
group by exchange, date, count
;
So in summary its always good to think about order of operations when using analytic functions and get the data you are working with to its simplest form before you use the analytic function.
Upvotes: 1
Reputation: 164
Does not look like a valid "Hive" query to me. Remember hive's query language is pretty limited compared to SQL. For example "IN" is not supported. Another exmaple RANK() OVER (...) - that's not supported either. In other words attempting to use RDBMS SQL directly in Hive mostly not work.
Upvotes: -4