Chema
Chema

Reputation: 2838

HiveQL Difference between rows, columns based in Dates

I have a table (t_stocks) with data like this:

exchanged,stock_symbol,closing_date,closing_price
NSE,TCS,2009-08-09,2200.1
NSE,TCS,2009-08-10,2300.1
NSE,TCS,2009-08-11,12200.1
NSE,TCS,2009-08-12,22300.1
NSE,TCS,2009-09-09,2200.1
NSE,TCS,2009-09-10,2300.1
NSE,TCS,2009-09-11,12200.1
NSE,TCS,2009-09-12,22300.1
NSE,INFY,2009-08-09,2500.34
NSE,INFY,2009-08-10,1500.34
NSE,INFY,2009-08-09,7500.34
NSE,INFY,2009-08-10,14500.34
NSE,INFY,2009-09-09,2500.34
NSE,INFY,2009-09-10,1500.34
NSE,INFY,2009-09-09,7500.34
NSE,INFY,2009-09-10,14500.34
NSE,TCS,2010-08-09,2200.1
NSE,TCS,2010-08-10,2300.1
NSE,TCS,2010-08-11,12200.1
NSE,TCS,2010-08-12,22300.1
NSE,TCS,2010-09-09,2200.1
NSE,TCS,2010-09-10,2300.1
NSE,TCS,2010-09-11,12200.1
NSE,TCS,2010-09-12,22300.1
NSE,INFY,2010-08-09,2500.34
NSE,INFY,2010-08-10,1500.34
NSE,INFY,2010-08-09,7500.34
NSE,INFY,2010-08-10,14500.34
NSE,INFY,2010-09-09,2500.34
NSE,INFY,2010-09-10,1500.34
NSE,INFY,2010-09-09,7500.34
NSE,INFY,2010-09-10,14500.34
...
...

I would need to write a query which generate a report as follow. exchanged, stock_symbol , closing_date , closing_price , yesterday_closing , diff_yesterday_price (Price difference between Yesterday prices and today prices) with an output like this:

+----------------+-------------------+-------------------+--------------------+------------------------+-----------------------+--+
| exchanged      |     stock_symbol  |     closing_date  |     closing_price  |     yesterday_closing  | diff_yesterday_price  |
+----------------+-------------------+-------------------+--------------------+------------------------+-----------------------+--+
| NSE            | INFY              | 2009-08-09        | 2500.34            | NULL                   | NULL                  |
| NSE            | INFY              | 2009-08-09        | 7500.34            | 2500.34                | -5000                 |
| NSE            | INFY              | 2009-08-10        | 14500.34           | 7500.34                | -7000                 |
| NSE            | INFY              | 2009-08-10        | 1500.34            | 14500.34               | 13000                 |
| NSE            | INFY              | 2009-09-09        | 7500.34            | 1500.34                | -6000                 |
| NSE            | INFY              | 2009-09-09        | 2500.34            | 7500.34                | 5000                  |
| NSE            | INFY              | 2009-09-10        | 14500.34           | 1500.34                | -13000                |
| NSE            | INFY              | 2009-09-10        | 1500.34            | 2500.34                | 1000                  |
| NSE            | INFY              | 2010-08-09        | 7500.34            | 14500.34               | 7000                  |
| NSE            | INFY              | 2010-08-09        | 2500.34            | 7500.34                | 5000                  |
.....
.....

May anyone give me some clues to do this in an efficient way.

Thanks in advance,

Regards.

Upvotes: 0

Views: 33

Answers (1)

zealous
zealous

Reputation: 7503

You can use hive window function lag() to solve this. You can read more about window functions in hive here.

Here is the working DEMO in PostgreSQL, but same query works in HIVE as well.

select
 exchanged,
 stock_symbol,
 closing_date,
 closing_price,
 yesterday_price,
 (yesterday_price - closing_price) as diff_yesterday_price
from
(
    select
        *,
        lag(closing_price) over (partition by stock_symbol order by closing_date) as yesterday_price
    from stockExchange
) la

order by
    stock_symbol,
    closing_date

Upvotes: 2

Related Questions