Reputation: 55
My access logs database stores time as epoch and extracts year month and day as integers. Further, the partitioning of the database is based on the extracted Y/m/d and I have a 35 day retention.
If I run this query:
select *
from mydb
where year in (2017, 2018)
and month in (12, 1)
and day in (31, 1)
I also realize that I can do something like this:
select *
from mydb
where (year = 2017 and month = 12 and day = 31)
or (year = 2018 and month = 1 and day = 1)
But what I am really looking for is this: a good way to write a query where I give the year month and day number as the start and then a fourth value (number of days +) and then get all the data for 12/31/2017 + 5 days for example.
Is there a native way in SQL to accomplish this? I have an enormous data set and if I don't specify the days and have to rely on the epoch to do this, the query takes forever. I also have no influence over the partitioning configuration.
Upvotes: 0
Views: 344
Reputation: 35613
With Impala as the dbms and SQL dialect you will be able to use common table expressions but not recursion. In addition there may be problems inserting parameters as well.
Below is an untested suggestion that will require you to locate some function alternatives. First it generates a set of rows with an integer from 0 to 999 (in the example). It is quite easy to expand the number of rows if required. From those rows it is possible to add the number of days to a timestamp literal using date_add(timestamp startdate, int days/interval expression)
and then with year(timestamp date)
and month(timestamp date)
and day(timestamp date)
see Date and Time functions create the columns needed to match to your data.
Overall then you should be able to build a common table expression that has columns for year, month, day that cover a wanted range, and that you can inner join to your source table and thereby implementing a date range filter.
The code below was produced using T-SQL (SQL Server) and it can be tested here.
-- produce a set of integers, adjust to suit needed number of these
;WITH
cteDigits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
)
, cteTally AS (
SELECT
d1s.digit
+ d10s.digit * 10
+ d100s.digit * 100 /* add more like this as needed */
-- + d1000s.digit * 1000 /* add more like this as needed */
AS num
FROM cteDigits d1s
CROSS JOIN cteDigits d10s
CROSS JOIN cteDigits d100s /* add more like this as needed */
-- CROSS JOIN cteDigits d1000s /* add more like this as needed */
)
, DateRange AS (
select
num
, dateadd(day,num,'20181227') dt
, year(dateadd(day,num,'20181227')) yr
, month(dateadd(day,num,'20181227')) mn
, day(dateadd(day,num,'20181227')) dy
from cteTally
where num < 10
)
select
*
from DateRange
I think these are the Impala equivalents for the function calls used above:
, DateRange AS (
select
num
, date_add(to_timestamp('20181227','yyyyMMdd'),num) dt
, year( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) yr
, month( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) mn
, day( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) dy
from cteTally
where num < 10
Hopefully you can work out how to use these. Ultimately the purpose is to use the generated date range like so:
select * from mydb t
inner join DateRange on t.year = DateRange.yr and t.month = DateRange.mn and t.day = DateRange.dy
original post
Well in the absence of knowing what database to propose solutions for, here is a suggestion using SQL Server:
This suggestion involves a recursive common table expression, which may then be used as an inner join to your source data to limit the results to a date range.
--Sql Server 2014 Express Edition
--https://rextester.com/l/sql_server_online_compiler
declare @yr as integer = 2018
declare @mn as integer = 12
declare @dy as integer = 27
declare @du as integer = 10
;with CTE as (
select
datefromparts(@yr, @mn, @dy) as dt
, @yr as yr
, @mn as mn
, @dy as dy
union all
select
dateadd(dd,1,cte.dt)
, datepart(year,dateadd(dd,1,cte.dt))
, datepart(month,dateadd(dd,1,cte.dt))
, datepart(day,dateadd(dd,1,cte.dt))
from cte
where cte.dt < dateadd(dd,@du-1,datefromparts(@yr, @mn, @dy))
)
select
*
from cte
This produces the following result:
+----+---------------------+------+----+----+
| | dt | yr | mn | dy |
+----+---------------------+------+----+----+
| 1 | 27.12.2018 00:00:00 | 2018 | 12 | 27 |
| 2 | 28.12.2018 00:00:00 | 2018 | 12 | 28 |
| 3 | 29.12.2018 00:00:00 | 2018 | 12 | 29 |
| 4 | 30.12.2018 00:00:00 | 2018 | 12 | 30 |
| 5 | 31.12.2018 00:00:00 | 2018 | 12 | 31 |
| 6 | 01.01.2019 00:00:00 | 2019 | 1 | 1 |
| 7 | 02.01.2019 00:00:00 | 2019 | 1 | 2 |
| 8 | 03.01.2019 00:00:00 | 2019 | 1 | 3 |
| 9 | 04.01.2019 00:00:00 | 2019 | 1 | 4 |
| 10 | 05.01.2019 00:00:00 | 2019 | 1 | 5 |
+----+---------------------+------+----+----+
and:
select * from mydb t
inner join cte on t.year = cte.yr and t.month = cte.mn and t.day = cte.dy
Instead of a recursive common table expression a table of integers may be used instead (or use a set unioned select queries to generate a set of integers) - often known as a tally table. The method one chooses will depend of dbms type and version being used.
Again, depending on database, it may be more efficient to persist the result seen above as a temporary table and add an index to that.
Upvotes: 1