mikernova
mikernova

Reputation: 55

Querying across months and days

My access logs database stores time as epoch and extracts year month and day as integers. Further, the partitioning of the database is based on the extracted Y/m/d and I have a 35 day retention.

If I run this query:

select * 
from mydb 
where year in (2017, 2018) 
  and month in (12, 1) 
  and day in (31, 1) 

I also realize that I can do something like this:

select * 
from mydb 
where (year = 2017 and month = 12 and day = 31) 
   or (year = 2018 and month = 1 and day = 1)

But what I am really looking for is this: a good way to write a query where I give the year month and day number as the start and then a fourth value (number of days +) and then get all the data for 12/31/2017 + 5 days for example.

Is there a native way in SQL to accomplish this? I have an enormous data set and if I don't specify the days and have to rely on the epoch to do this, the query takes forever. I also have no influence over the partitioning configuration.

Upvotes: 0

Views: 344

Answers (1)

Paul Maxwell
Paul Maxwell

Reputation: 35613

With Impala as the dbms and SQL dialect you will be able to use common table expressions but not recursion. In addition there may be problems inserting parameters as well.

Below is an untested suggestion that will require you to locate some function alternatives. First it generates a set of rows with an integer from 0 to 999 (in the example). It is quite easy to expand the number of rows if required. From those rows it is possible to add the number of days to a timestamp literal using date_add(timestamp startdate, int days/interval expression) and then with year(timestamp date) and month(timestamp date) and day(timestamp date) see Date and Time functions create the columns needed to match to your data.

Overall then you should be able to build a common table expression that has columns for year, month, day that cover a wanted range, and that you can inner join to your source table and thereby implementing a date range filter.

The code below was produced using T-SQL (SQL Server) and it can be tested here.

-- produce a set of integers, adjust to suit needed number of these
;WITH
  cteDigits AS (
      SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
      SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
      )
, cteTally AS (
      SELECT 
              d1s.digit 
            + d10s.digit * 10
            + d100s.digit * 100  /* add more like this as needed */
            -- + d1000s.digit * 1000  /* add more like this as needed */
            AS num
      FROM cteDigits d1s
      CROSS JOIN cteDigits d10s
      CROSS JOIN cteDigits d100s /* add more like this as needed */
      -- CROSS JOIN cteDigits d1000s /* add more like this as needed */
      )
, DateRange AS (
    select 
          num
        , dateadd(day,num,'20181227')        dt
        , year(dateadd(day,num,'20181227'))  yr
        , month(dateadd(day,num,'20181227')) mn
        , day(dateadd(day,num,'20181227'))   dy
    from cteTally
    where num < 10
    )
select
*
from DateRange

I think these are the Impala equivalents for the function calls used above:

, DateRange AS (
    select 
          num
        ,        date_add(to_timestamp('20181227','yyyyMMdd'),num)    dt
        ,  year( date_add(to_timestamp('20181227','yyyyMMdd'),num) )  yr
        , month( date_add(to_timestamp('20181227','yyyyMMdd'),num) )  mn
        ,   day( date_add(to_timestamp('20181227','yyyyMMdd'),num) )  dy
    from cteTally
    where num < 10

Hopefully you can work out how to use these. Ultimately the purpose is to use the generated date range like so:

select * from mydb t
inner join DateRange on t.year = DateRange.yr and t.month = DateRange.mn and t.day = DateRange.dy

original post

Well in the absence of knowing what database to propose solutions for, here is a suggestion using SQL Server:

This suggestion involves a recursive common table expression, which may then be used as an inner join to your source data to limit the results to a date range.

--Sql Server 2014 Express Edition
--https://rextester.com/l/sql_server_online_compiler

declare @yr as integer = 2018
declare @mn as integer = 12
declare @dy as integer = 27
declare @du as integer = 10

;with CTE as (
    select
           datefromparts(@yr, @mn, @dy) as dt
         , @yr as yr
         , @mn as mn
         , @dy as dy

    union all

    select
           dateadd(dd,1,cte.dt)
         , datepart(year,dateadd(dd,1,cte.dt)) 
         , datepart(month,dateadd(dd,1,cte.dt)) 
         , datepart(day,dateadd(dd,1,cte.dt)) 
    from cte
    where cte.dt < dateadd(dd,@du-1,datefromparts(@yr, @mn, @dy))
    )
select
*
from cte

This produces the following result:

+----+---------------------+------+----+----+
|    |         dt          |  yr  | mn | dy |
+----+---------------------+------+----+----+
|  1 | 27.12.2018 00:00:00 | 2018 | 12 | 27 |
|  2 | 28.12.2018 00:00:00 | 2018 | 12 | 28 |
|  3 | 29.12.2018 00:00:00 | 2018 | 12 | 29 |
|  4 | 30.12.2018 00:00:00 | 2018 | 12 | 30 |
|  5 | 31.12.2018 00:00:00 | 2018 | 12 | 31 |
|  6 | 01.01.2019 00:00:00 | 2019 |  1 |  1 |
|  7 | 02.01.2019 00:00:00 | 2019 |  1 |  2 |
|  8 | 03.01.2019 00:00:00 | 2019 |  1 |  3 |
|  9 | 04.01.2019 00:00:00 | 2019 |  1 |  4 |
| 10 | 05.01.2019 00:00:00 | 2019 |  1 |  5 |
+----+---------------------+------+----+----+

and:

select * from mydb t
inner join cte on t.year = cte.yr and t.month = cte.mn and t.day = cte.dy

Instead of a recursive common table expression a table of integers may be used instead (or use a set unioned select queries to generate a set of integers) - often known as a tally table. The method one chooses will depend of dbms type and version being used.

Again, depending on database, it may be more efficient to persist the result seen above as a temporary table and add an index to that.

Upvotes: 1

Related Questions