Reputation: 781
I have this table:
CREATE TABLE ip_logs (
`ip_address` VARCHAR(11),
`start_date` VARCHAR(11),
`end_date` VARCHAR(11),
`loc_id` INTEGER
);
INSERT INTO ip_logs
(`ip_address`,`start_date`,`end_date`, `loc_id`)
VALUES
('120.0.53.21','2020-01-03','2020-01-09', '5'),
('198.5.273.2','2020-01-10','2020-01-14', '4'),
('198.5.273.2','2020-01-10','2020-01-14', '4'),
('198.5.273.2','2020-01-10','2020-01-14', '4'),
('100.36.33.1','2020-02-01','2020-02-02', '4'),
('100.36.33.1','2020-02-01','2020-02-02', '4'),
('100.36.33.1','2020-02-01','2020-02-02', '4'),
('198.0.47.33','2020-02-22','2020-02-24', '2'),
('122.8.0.11', '2020-02-25','2020-02-30','4'),
('198.0.47.33','2020-03-10','2020-03-17', '2'),
('198.0.47.33','2020-03-10','2020-03-17', '2'),
('122.8.0.11', '2020-03-18','2020-03-23','4'),
('198.5.273.2','2020-03-04','2020-03-09', '3'),
('106.25.12.2','2020-03-24','2020-03-30', '1');
I use this query to select the most frequent ip address:
select (
select ip_address
from ip_logs t2
where t2.loc_id = t1.loc_id
group by ip_address
order by count(*) desc
limit 1)
from ip_logs t1
group by loc_id
This works in mysql8.0. This however does not work in Hive as well. I get this error:
cannot recognize input near 'select' 'ip_address' 'from' in expression specification
Expected output is:
loc_id | ip_address
5 120.0.53.21
4 198.5.273.2
2 198.0.47.33
3 198.5.273.2
1 106.25.12.2
Upvotes: 1
Views: 68
Reputation: 37473
You can try using row_number()
window function
select * from
(
select ip_address,loc_id,count(*) as frequency
row_number() over(partition by loc_id order by count(*) desc) as rn
from ip_logs group by ip_address,loc_id
)A where rn=1
Upvotes: 1