user1486507
user1486507

Reputation: 709

HiveQL, Hive SQL select date range

It seems simple in SQL but I'm having troubles using HiveQL with date range.

I have a dataset like this:

hive> describe logs;
 id string,
 ts string,
 app_id int

hive> select * from logs limit 5;
1389    2014-10-05 13:57:01 12
1656    2014-10-06 03:57:59 15
1746    2014-10-06 10:58:25 19
1389    2014-10-09 08:57:01 12
1656    2014-10-10 01:57:59 15

My goal is to get the distinct id for the last 3 days. The best thing is to read the current system time and get the unique id of last 3 days, but not sure where I need to put "unix_timestamp()". Considered that the log is recorded realtime and there's today's date in ts, I tried to use this query (first approach)

hive > SELECT distinct id FROM logs HAVING to_date(ts) > date_sub(max(ts), 3) and to_date(ts) <  max(ts);
FAILED: SemanticException [Error 10025]: Line 1:45 Expression not in GROUP BY key 'ts'

If I add group by 'ts' like below, it spits up this error:

hive> SELECT distinct ext FROM pas_api_logs group by ts HAVING to_date(ts) > date_sub(max(ts), 7) and to_date(ts) <  max(ts);
FAILED: SemanticException 1:47 SELECT DISTINCT and GROUP BY can not be in the same query. Error encountered near token 'ts'

After the numerous try, the last approach made was this, studied after [similar topic][1].

Select distinct id from (SELECT * FROM logs JOIN logs ON (max(logs.ts) = to_date(logs.ts))
 UNION ALL
 SELECT * FROM logs JOIN logs ON (to_date(logs.ts) = date_sub(max(logs.ts), 1))
 UNION ALL 
 SELECT * FROM logs JOIN logs ON (to_date(logs.ts) = date_sub(max(logs.ts), 2)));

Apparently this doesn't work either. Can someone shed some lights on this?

Upvotes: 1

Views: 3810

Answers (1)

Ajaykishan
Ajaykishan

Reputation: 36

The required result can be obtained by using this statement:
select distinct id from logs where DATEDIFF(from_unixtime(unix_timestamp()),ts) <= 3;

Hope it helps !

Upvotes: 2

Related Questions