Doug Fir
Doug Fir

Reputation: 21212

Selecting a sequence in SQL

There seems to be a few blog posts on this topic but the solutions really are not so intuitive. Surely there's a "Canonical" way?

I'm using Teradata SQL.

How would I select

  1. A range of number
  2. A date range

E.g.

SELECT 1:10 AS Nums

SELECT 1-1-2010:5-1-2014 AS Dates1

The result would be 10 rows (1 - 10) in the first SELECT query and ~(365 * 3.5) rows in the second?

Upvotes: 0

Views: 3127

Answers (3)

Dimitar Nentchev
Dimitar Nentchev

Reputation: 141

sequence 1 to 10

sel sum (1) over (ROWS UNBOUNDED PRECEDING) as seq_val
from sys_calendar.CALENDAR
qualify  row_number () over (order by 1)<=10

Upvotes: 0

dnoeth
dnoeth

Reputation: 60462

In Teradata you can use the existing sys_calendar to get those dates:

SELECT calendar_date
FROM sys_calendar.CALENDAR
WHERE calendar_date BETWEEN DATE '2010-01-01' AND DATE '2014-05-01';

Note: DATE '2010-01-01' is the only recommended way to write a date in Teradata

There's probably another custom calendar for the specific business needs of your company, too. Everyone will have access rights to it.

You might also use this for the range of numbers:

SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 10;

But you should check Explain to see if the estimated number of rows is correct. sys_calendar is a kind of template and day_of_calendar is a calculated column, so no statistics exists on that and Explain will return an estimated number of 14683 (20 percent of the number of rows in that table) instead of 10. If you use it in additional joins the optimizer might do a bad plan based on that totally wrong number.

Note: If you use sys_calendar you are limited to a maximum of 73414 rows, dates between 1900-01-01 and 2100-12-31 and numbers between 1 and 73414, your business calendar might vary.

Gordon Linoff's recursive query is not really efficient in Teradata, as it's a sequential row-by-row processing in a parallel database (each loop is an "all-AMPs step" in Explain) and the optimizer doesn't know how many rows will be returned.

If you need those ranges regularly you might consider creating a numbers table, I usually got one with a million rows or I use my calendar with the full range of 10000 years :-)

--DROP TABLE nums;
CREATE TABLE nums(n INT NOT NULL PRIMARY KEY CHECK (n BETWEEN 0 AND 999999));

INSERT INTO Nums
WITH cte(n) AS
 (
   SELECT day_of_calendar - 1
   FROM sys_calendar.CALENDAR
   WHERE day_of_calendar BETWEEN 1 AND 1000
 ) 
SELECT
   t1.n +
   t2.n * 1000 
FROM cte t1 CROSS JOIN cte t2;

COLLECT STATISTICS COLUMN(n) ON Nums;

The COLLECT STATS is the most important step to get correct estimates. Now it's a simple

SELECT n FROM nums WHERE n BETWEEN 1 AND 10;

There's also a nice UDF on GitHub for creating sequences which is easy to use:

SELECT DATE '2010-01-01' + SEQUENCE 
FROM TABLE(gen_sequence(0,DATE '2014-05-01' - DATE '2010-01-01')) AS t;

SELECT SEQUENCE 
FROM TABLE(gen_sequence(1,10)) AS t;

But it's usually hard to convince your DBA to install any C-UDFs and the number of rows returned is unknown again.

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269633

The "canonical" way to do this in SQL is using recursive CTEs, which the more recent versions of Teradata support.

For your first example:

with recursive nums(n) as (
      select 1 as n
      union all
      select n + 1
      from nums
      where n < 10
     )
select *
from nums;

You can do something similar for dates.

EDIT:

You can also do this by using row_number() and an existing table:

with nums(n) as (
      select n
      from (select row_number() over (order by col) as n
            from ExstingTable t
           ) t
      where n <= 10
     )
select *
from nums;

ExistingTable is just any table with enough rows. The best choice of col is the primary key.

with digits(n) as (
      select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
      select 6 union all select 7 union all select 8 union all select 9 union all select 10
     )
select *
from digits;

If your version of Teradata supports multiple CTEs, you can build on the above:

with digits(n) as (
      select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
      select 6 union all select 7 union all select 8 union all select 9 union all select 10
     ),
     nums(n) as (
      select d1.n*100 + d2.n*10 + d3.n
      from digits d1 cross join digits d2 cross join digits d3
     )
select *
from nums;

Upvotes: 3

Related Questions