user2462699
user2462699

Reputation: 439

Error in Hive : Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected

I am trying to translate some PL/SQL script in hive, and i faced an error with one HiveQL script.

The error is this one :

    FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected.

I think that the error is coming from this part of script :

SELECT
        mag.co_magasin,
        dem.id_produit                                  as id_produit_orig,
        pnvente.dt_debut_commercial                     as dt_debut_commercial,
        COALESCE(pnvente.id_produit,dem.id_produit)     as id_produit,
        min(
          CASE WHEN dem.co_validation IS NULL THEN 0 ELSE 1 END
        )                                               as flg_demarque_valide,
        sum(CASE WHEN dem.co_validation IS NULL THEN 0 ELSE cast(dem.mt_revient_ope AS INT) END)
                                                        as me_dem_con_prx_cs,
        0                                               as me_dem_inc_prx_cs,
        0                                               as me_dem_prov_stk_cs,
        sum(CASE WHEN dem.co_validation IS NULL THEN 0 ELSE cast(dem.qt_demarque AS INT) END)
                                                        as qt_dem_con,
        0                                               as qt_dem_inc,
        0                                               as qt_dem_prov_stk,
        RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC, COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
      from default.calendrier cal
      INNER JOIN default.demarque_mag_jour dem
      ON  CASE WHEN dem.co_societe = 1 THEN 1 ELSE 2 END = '${hiveconf:in_co_societe}'
      AND dem.dt_jour    = cal.dt_jour
      LEFT OUTER JOIN default.produit_norm pn
      ON  pn.co_societe = dem.co_societe
      AND pn.id_produit = dem.id_produit
      LEFT OUTER JOIN default.produit_norm pnvente
      ON  pnvente.co_societe = pn.co_societe
      AND pnvente.co_produit_rfu = pn.co_produit_lip
      AND pnvente.co_type_motif='05'
      INNER JOIN default.kpi_magasin mag
      ON  mag.co_societe = '${hiveconf:in_co_societe}'
      AND mag.id_magasin = dem.id_magasin
      WHERE cal.dt_jour = '${hiveconf:in_dt_jour}'
      AND NOT (dem.co_validation IS NULL AND cal.dt_jour > from_unixtime(unix_timestamp()-3*60*60*24, 'ddmmyyyy'))
      -- JYP 4.4
      AND dem.co_operation_magasin IN ('13','14','32')
      GROUP BY
        mag.co_magasin,
        dem.id_produit,
        pnvente.dt_debut_commercial,
        COALESCE(pnvente.id_produit,dem.id_produit)

But i can't find any solution on the web.

Thanks for your help :-)

Upvotes: 3

Views: 8459

Answers (4)

JAB
JAB

Reputation: 12801

I have run into the same error. rank() is case sensitive in hive and the error message give nothing away. Try changing RANK() to rank().

Upvotes: 4

ferics2
ferics2

Reputation: 5432

Funny enough, I actually hit this same error today. The problem for me was that one of the columns I was using in my analytic function was not a valid column. W/O knowing what columns your tables provide its impossible for me to prove this is your problem, but you may want to make sure all the columns in your RANK are valid.

Upvotes: 1

ferics2
ferics2

Reputation: 5432

My guess is that it has to do with the coalesce inside your rank. Analytic functions work but are more limited in HiveQL. I would try all your joins and sums in an inner query and then do the rank in an outer query. Often times this is required as HiveQL does not always follow the same order of operations you would expect from a typical SQL language. Consider a table based on stock information:

select count(*) as COUNT
from NYSE_STOCKS
where date in ('2001-12-20','2001-12-21','2001-12-24') and exchange = 'NYSE';

Now consider the following query:

select 
  exchange
  , date
  , count(*) over (partition by exchange) 
from NYSE_STOCKS 
where date in ('2001-12-20','2001-12-21','2001-12-24') 
group by exchange, date;

You would expect the following results:

EXCHANGE | DATE       | COUNT
NYSE     | 2001-12-20 | 5199
NYSE     | 2001-12-21 | 5199
NYSE     | 2001-12-24 | 5199 

But you would actually get this in HiveQL:

EXCHANGE | DATE       | COUNT
NYSE     | 2001-12-20 | 3
NYSE     | 2001-12-21 | 3
NYSE     | 2001-12-24 | 3

To get the correct results you have to do the group by in an inner query and the analytic function in the outer query:

select 
  exchange
  , date
  , count
from (
  select 
        exchange
        , date
        , count(*) over (partition by exchange) as count
  from NYSE_STOCKS 
  where date in ('2001-12-20','2001-12-21','2001-12-24') 
) A
group by exchange, date, count
;

So in summary its always good to think about order of operations when using analytic functions and get the data you are working with to its simplest form before you use the analytic function.

Upvotes: 1

Arijit Banerjee
Arijit Banerjee

Reputation: 164

Does not look like a valid "Hive" query to me. Remember hive's query language is pretty limited compared to SQL. For example "IN" is not supported. Another exmaple RANK() OVER (...) - that's not supported either. In other words attempting to use RDBMS SQL directly in Hive mostly not work.

Upvotes: -4

Related Questions