Padfoot123
Padfoot123

Reputation: 1117

Impala Query to get next date

I have 2 Impala tables.

1st table T1 (additional columns are there but I am interested in only date and day type as weekday):

date       day_type
04/01/2020 Weekday
04/02/2020 Weekday
04/03/2020 Weekday
04/04/2020 Weekend
04/05/2020 Weekend
04/06/2020 Weekday

2nd table T2:

process date       status
A       04/01/2020 finished
A       04/02/2020 finished
A       04/03/2020 finished
A       04/03/2020 run_again

Using Impala queries I have to get the maximum date from second table T2 and get its status. According to the above table 04/03 is the maximum date. If the status is finished on 04/03, then my query should return the next available weekday date from T1 which is 04/06/2020. But if the status is run_again, then the query should return the same date. In the above table, 04/03 has run_again and when my query runs the output should be 04/03/2020 and not 04/06/2020. Please note more than one status is possible for a date. For example, 04/03/2020 can have a row with finished as status and another with run again as status. In this case run again should be prioritized and the query should give 04/03/2020 as output date

What I tried so far: I ran a subquery from second table and got the maximum date and its status. I tried to run a case in my main query and gave T1 as subselect in Case statement but its not working.

Is it possible to achieve this through Impala query?

Upvotes: 2

Views: 748

Answers (2)

yammanuruarun
yammanuruarun

Reputation: 403

There might be multiple solutions and even some better ones considering performance but this is my approach. Hope it helps.

select case when status='run_again' then t2_date else t1_date end as needed_date from t2 cross join (select t1_date from t1 where t1.day_type='Weekday' and t1_date>(select max(t2_date) from t2) order by t1.t1_date limit 1)a where t2_date=(select max(t2_date) from t2);

Upvotes: 1

mazaneicha
mazaneicha

Reputation: 9425

One way to do this is to create a CTE from table T1 instead of a correlated subquery. Something like:

WITH T3 as (
  select t.date date, min(x.date) next_workday
  from T1 t join T1 x
  on t.date < x.date
  where x.day_type = 'Weekday'
  group by t.date
)
select T2.process, T2.date run_date, T2.status,
  case when T2.status = 'finished' then T3.next_workday
  else T3.date
  end next_run_date
from T2 join T3
on T2.date = T3.date
order by T2.process, T2.date;
+---------+------------+-----------+---------------+
| process | run_date   | status    | next_run_date |
+---------+------------+-----------+---------------+
| A       | 2020-04-01 | finished  | 2020-04-02    |
| A       | 2020-04-02 | finished  | 2020-04-03    |
| A       | 2020-04-03 | run again | 2020-04-03    |
+---------+------------+-----------+---------------+

You can then select max from the result instead of ordering.

Upvotes: 3

Related Questions