user3078335
user3078335

Reputation: 781

How to convert Mysql query to Hive

I have this table:

CREATE TABLE ip_logs (
  `ip_address` VARCHAR(11),
  `start_date` VARCHAR(11),
  `end_date` VARCHAR(11),
  `loc_id` INTEGER
);

INSERT INTO ip_logs
  (`ip_address`,`start_date`,`end_date`, `loc_id`)
VALUES
 ('120.0.53.21','2020-01-03','2020-01-09', '5'),
 ('198.5.273.2','2020-01-10','2020-01-14', '4'),
 ('198.5.273.2','2020-01-10','2020-01-14', '4'),
 ('198.5.273.2','2020-01-10','2020-01-14', '4'),
 ('100.36.33.1','2020-02-01','2020-02-02', '4'),
 ('100.36.33.1','2020-02-01','2020-02-02', '4'),
 ('100.36.33.1','2020-02-01','2020-02-02', '4'),
 ('198.0.47.33','2020-02-22','2020-02-24', '2'),
 ('122.8.0.11', '2020-02-25','2020-02-30','4'),
 ('198.0.47.33','2020-03-10','2020-03-17', '2'),
 ('198.0.47.33','2020-03-10','2020-03-17', '2'),
 ('122.8.0.11', '2020-03-18','2020-03-23','4'),
 ('198.5.273.2','2020-03-04','2020-03-09', '3'),
 ('106.25.12.2','2020-03-24','2020-03-30', '1');

I use this query to select the most frequent ip address:

select  (
select ip_address
from ip_logs t2
where t2.loc_id = t1.loc_id
group by ip_address
order by count(*) desc
limit 1)

from ip_logs t1
group by loc_id

This works in mysql8.0. This however does not work in Hive as well. I get this error:

cannot recognize input near 'select' 'ip_address' 'from' in expression specification

Expected output is:

loc_id | ip_address
5        120.0.53.21
4        198.5.273.2
2        198.0.47.33
3        198.5.273.2
1        106.25.12.2

Upvotes: 1

Views: 68

Answers (1)

Fahmi
Fahmi

Reputation: 37473

You can try using row_number() window function

select * from
(
select ip_address,loc_id,count(*) as frequency
row_number() over(partition by loc_id order by count(*) desc) as rn
from ip_logs group by ip_address,loc_id
)A where rn=1

Upvotes: 1

Related Questions