How to optimize nested innner hive query

Question

I have a table with following stock data where we have couple of columns like date, ticker, open and close(stock prices).

To query this data, I want to know which stock has given the highest margin on particular date. So if I have 516 different stocks, my query should return 516 rows of ticker, date, open, close and a new column Margin(which will be max(close-open)).

| deep_stocks.date_  | deep_stocks.ticker  | deep_stocks.open  | deep_stocks.close  |
+--------------------+---------------------+-------------------+--------------------+--+
| 20100721           | A                   | 27.68             | 27.58              |
| 20100722           | A                   | 27.95             | 28.72              |
| 20100723           | A                   | 28.56             | 29.3               |
| 20100726           | A                   | 29.22             | 29.64              |
| 20100727           | A                   | 29.73             | 28.87              |
| 20100728           | A                   | 28.79             | 28.78              |
| 20100729           | A                   | 28.97             | 28.15              |
| 20100730           | A                   | 27.78             | 27.93              |
| 20100802           | A                   | 28.35             | 28.82              |
| 20100803           | A                   | 28.7              | 27.84              |

I have written a query where my approach was:

Step 1 - Get the difference between Close and Open prices (Inner/Sub query)
Step 2 - Get the maximum of margin for every stock (used group by with max function)
Step 3 - Join the results with Main Table and get the data.

I'll put my query in solution or comments can someone please correct it as it is taking more time. Also I would like to know can we have any other alternative approach.

Pardeep Sharma · Accepted Answer

As already told about my approach please find below query:

SELECT ds.ticker, ds.date_, ds.close, ds.open, ds.Margin FROM 
 (SELECT ticker, date_, close, open, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks) ds 
JOIN 
 (SELECT dsIn.ticker, max(dsIn.Margin) mxMargin FROM 
    (select ticker, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks ) dsIn group by dsIn.ticker) dsEx 
ON ds.ticker=dsEx.ticker AND ds.Margin=dsEx.mxMargin ORDER BY ds.Margin;

Do we have any other alternatives for this query or can it be possible to optimize it.

How to optimize nested innner hive query

Answers (1)

Related Questions