Reputation: 97
I have the following statement in a query which works fine in Presto:
CASE
WHEN cte_v1.id_agent is not null
THEN cte_v1.id_agent
WHEN cte_v1.id_agent is null
and lead(cte_v1.interaction_channel,1) ignore nulls over(partition by id_ticket, interaction_channel order by cte_v1.interaction_start_time) = 'messaging'
THEN lead(cte_v1.id_agent,1) ignore nulls over(partition by id_ticket, interaction_channel order by cte_v1.interaction_start_time)
END as agent
however when I try to operationalize this query in Hive, I receive the following error 'FAILED: ParseException line 12:39 missing KW_THEN at 'ignore' near 'nulls' in statement'
As mentioned, this query works fine in Presto and gives the correct result by recreating the id_agent column and replaces any null agent ID with the next non-null value
agent | id_agent | interaction_start_time | interaction_end_time | inbound_or_outbound | interaction_channel |
---|---|---|---|---|---|
364920063 | 364920063 | 2020-09-16 6:38:47 | 2020-09-16 6:38:47 | Outbound | messaging |
364920063 | NULL | 2020-09-16 12:37:54 | 2020-09-16 12:37:54 | Inbound | messaging |
364920063 | 364920063 | 2020-09-17 8:02:07 | 2020-09-17 8:02:07 | Outbound | messaging |
191918955 | 191918955 | 2020-09-17 8:16:36 | 2020-09-17 8:25:45 | Outbound | messaging |
191918955 | NULL | 2020-09-17 11:16:47 | 2020-09-17 11:16:47 | Inbound | messaging |
191918955 | NULL | 2020-09-17 16:02:35 | 2020-09-17 16:11:30 | Inbound | messaging |
191918955 | 191918955 | 2020-09-20 2:13:14 | 2020-09-20 2:13:14 | Outbound | messaging |
191918955 | 191918955 | 2020-09-20 2:44:40 | 2020-09-20 2:46:09 | Inbound | messaging |
191918955 | 191918955 | 2020-09-20 7:13:40 | 2020-09-20 7:13:40 | Outbound | messaging |
Upvotes: 1
Views: 1602
Reputation: 38290
Currently there are two open JIRAs both about adding IGNORE NULLs/RESPECT NULLs to LEAD/LAG: HIVE-18145 and HIVE-24869
But it seems your problem can be easily solved using first_value
:
If you want to find next not null value you can use first_value function with second optional parameter TRUE, it means ignore NULLs.
Example (used part of your data example, timestamps fixed to be 24hrs-based to be sortable), you can add partition by
and conditions checking, etc to adopt it, this demo is minimal example to show how it works:
with mytable as
(
select 364920063 id_agent ,'2020-09-16 06:38:47' interaction_start_time union all
select NULL ,'2020-09-16 12:37:54' union all
select 364920063 ,'2020-09-17 08:02:07' union all
select 191918955 ,'2020-09-17 08:16:36' union all
select NULL ,'2020-09-17 11:16:47' union all
select NULL ,'2020-09-17 16:02:35' union all
select 191918955 ,'2020-09-20 02:13:14'
)
select id_agent, interaction_start_time,
--get first not null value in the frame between current row and unbounded following
first_value(id_agent,true) over(order by interaction_start_time rows between current row and unbounded following) agent
from mytable
Result:
id_agent interaction_start_time agent
364920063 2020-09-16 06:38:47 364920063
NULL 2020-09-16 12:37:54 364920063
364920063 2020-09-17 08:02:07 364920063
191918955 2020-09-17 08:16:36 191918955
NULL 2020-09-17 11:16:47 191918955
NULL 2020-09-17 16:02:35 191918955
191918955 2020-09-20 02:13:14 191918955
Upvotes: 1