ngShravil.py
ngShravil.py

Reputation: 5048

How to count the rows which is lesser than a particular row in a Hive table?

Consider the following table in Hive

+------+------+
| id   | res  |
+------+------+
|    1 |   55 |
|    2 |   10 |
|    3 |   89 |
|    4 |  100 |
|    5 |   80 |
|    6 |   55 |
|    7 |   70 |
|    8 |   35 |
|    9 |   46 |
|   10 |   51 |
+------+------+

Now I have to calculate the number of rows which are lesser than the res value in a particular row.

For the above table the output should be

+------+------+
| id   |count |
+------+------+
|    1 |    4 |
|    2 |    0 |
|    3 |    8 |
|    4 |    9 |  
|    5 |    7 |
|    6 |    4 | 
|    7 |    6 |
|    8 |    1 |
|    9 |    2 |
|   10 |    3 |
+------+------+

Upvotes: 0

Views: 104

Answers (4)

deusxmach1na
deusxmach1na

Reputation: 368

Rank is probably the way to go, but here is a fun alternative:

SELECT      mt.id              AS id 
            , mt.res           AS res 
            , COUNT(1) OVER (PARTITION BY NULL ORDER BY mt.res ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1   AS cnt
FROM        my_table mt 

Upvotes: 0

Yohannes
Yohannes

Reputation: 1

You can do the following but remember to remove 1 from result from rank since we are not checking for <= but < (in order words we are excluding current row from count)

select
id,
res,
rank() over (ORDER BY res) -1 as rank
FROM point

ORDER BY  id

Or the long way:

Since Hive does not support CTE (it's based on SQL-92 standard), we will have to use a sub query instead.

Assumptions: I called the table that contains both ID and RES As POINT.

Select id, sum(comparison) as count
From (

Select
a.id,
a.res as res1,
b.res as res2,
Case when a.res > b.res then 1
Else 0
End as comparison

FROM point a
CROSS JOIN point b
) c

GROUP BY id

Please test and let me know.

Upvotes: 0

ozw1z5rd
ozw1z5rd

Reputation: 3208

Voila'

+-----+------+
| id  | _c1  |
+-----+------+
| 1   | 4    |
| 2   | 0    |
| 3   | 8    |
| 4   | 9    |
| 5   | 7    |
| 6   | 4    |
| 7   | 6    |
| 8   | 1    |
| 9   | 2    |
| 10  | 3    |
+-----+------+

It's easy and it's crazy since this query does the cross product. Of course for each row you have to find all the rows which have a value which is lesser, something which looks like a cross product is implicit.

SELECT id, SUM(IF ( c.res1 > c.res2, 1 , 0 )) 
FROM ( 
    SELECT id, a.res AS res1, b.res AS res2 
    FROM test_4 AS a 
         INNER JOIN ( 
            SELECT res
            FROM test_4 
         ) b 
) c 
GROUP BY id;

Upvotes: 0

Ambrish
Ambrish

Reputation: 3677

You can try RANK OVER functionality.

Sample Hiveql

select
  id,
  res,
  rank() over (ORDER BY res) as rank
from
  my_table
order by
  res

Read more here and here.

Upvotes: 3

Related Questions