Yakir Shlezinger
Yakir Shlezinger

Reputation: 191

"Get the first row of each day" SQL query

I am the end-user of a SQL Server DB with multiple lines ordered by date. Lets take this DB as an example:

Amount Date
23.5 20210512010220111
24 20210512020220111
30 20210512030220111
1.2 20210513011020111
1000 20210513020220111
24 20210514100220111
240 20210514100220111

Be advised that the date is just a long that represent the date in the format: yyyymmddhhMMssfff.

I am trying to create a SQL query like this: "Get the first row of each day" So for the above example the result will be:

Amount Date
23.5 20210512010220111
1.2 20210513011020111
24 20210514100220111

I saw this example in multiple sources: https://learnsql.com/cookbook/how-to-select-the-first-row-in-each-group-by-group/

The problem is when I tried it that was way too slow for me the DB is storing hundreds of millions of rows (with 9 columns each)

A couple of weeks ago I used a similar(ish) query for a daily min, avg, max:

SELECT MIN(Amount), AVG(Amount), MAX(Amount)
FROM table
GROUP BY Date/1000000000

That worked quickly enough, if there is something similar to FIRST(Amount) that would be great. Just to clarify, I am just an end-user, I have no saying over the overall structure of the DB.

Edit: This is the query I tried and was too slow:

WITH added_row_number AS (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY Date/1000000000 ORDER BY Date ASC) AS row_number
  FROM table
)
SELECT
  *
FROM added_row_number
WHERE row_number = 1;

EDIT 2:

I took inspiration from all the answers here and after some trial and error I found this query worked fast enough (adjusted query to suit the example DB, not the actual query I used):

SELECT OrgTable.*
FROM (
     SELECT *
     FROM Table
     -- WHERE statement on a uniq key column
) As OrgTable
INNER JOIN (
     SELECT MIN(Date) As ModTimeMin
     FROM Table
     -- WHERE statement on a uniq key column
     GROUP BY DATE/1000000000 
     --this sub query gets the time of the first transaction of each day 
) As MinTable
ON OrgTable.Date = MinTable.Date --this joins the relevant data to the times table
ORDER BY OrgTable.Date ASC

Thank you.

Upvotes: 0

Views: 3247

Answers (3)

MikeY
MikeY

Reputation: 21

The following may perform better than your original query as they use an existance check (or top clause) instead of reading all data, calculating row_number for all rows, then scanning the result. It will perform best if there is already an index on the [Date] column.

As you have duplicate [Date] values, the query could return different results each time it is executed unless a unique key column is included in the query.

create table  #t(
    [Id] int
    ,[Amount] decimal(10,2)
    ,[Date] bigint
);

create index idx_Date on #t([Date]);

insert #t values
(1, 23.5, 20210512010220111)
,(2, 24, 20210512020220111)
,(3, 30, 20210512030220111)
,(4, 1.2, 20210513011020111)
,(5, 1000, 20210513020220111)
,(6, 24, 20210514100220111)
,(7, 240, 20210514100220111);


-- Assuming that you have a unique key available
select
    *
    ,t1.[Date]/1000000000
from #t t1
where not exists (
    select *
    from #t t2
    where t1.[Date]/1000000000 = t2.[Date]/1000000000
    and (
        t2.[Date] < t1.[Date]
        or (
            t2.[Date] = t1.[Date]
            and t2.Id < t1.Id
        )
    )
);

--This is a kludge if you don't have a unique key available and may perform worse than your original query. Don't use this without testing it in a non production system first.
select
    *
    ,t1.[Date]/1000000000
from #t t1
where not exists (
    select *
    from #t t2
    where t1.[Date]/1000000000 = t2.[Date]/1000000000
    and (
        t2.[Date] < t1.[Date]
        or (
            t2.[Date] = t1.[Date]
            and t2.%%physloc%% < t1.%%physloc%% -- %%physloc%% is the File/Page/Slot for the row
        )
    )
);

-- Alternatively using top. Assumes a unique column is available
select
    t1.*
    ,t1.[Date]/1000000000
from #t t1
cross apply (
    select top 1 *
    from #t t2
    where t1.[Date]/1000000000 = t2.[Date]/1000000000
    order by Date, Id
) t2
where t1.Id = t2.Id

drop table #t;

Upvotes: 1

Mark Schultheiss
Mark Schultheiss

Reputation: 34168

Feels like you can use a cte here and cast the date strings as actual DATE values (the last query) here is my sample using test data (you did not post the column types so I guessed)

Not 100% clear about the "date" column if it is an actual datetime, you can just cast as a date

DECLARE @mytable TABLE (
    Amount NUMERIC(10,2) NOT NULL,  
    [Date] VARCHAR(30) NOT NULL
);
INSERT INTO @mytable(Amount,[Date])
VALUES
    (3.5, '20210512010220111'),
    (24.0,  '20210512020220111'),
    (30.0,'20210512030220111'),
    (1.2,   '20210513011020111'),
    (1000.0,    '20210513020220111'),
    (24.0,  '20210514100220111'),
    (240.0, '20210514100220111')
;

SELECT 
    [Amount], 
    MAX(CAST( LEFT([Date], 8)  AS DATE)) AS NewDate
FROM @mytable
GROUP BY AMOUNT 
ORDER BY MAX(CAST( LEFT([Date], 8)  AS DATE)) DESC;


/* this is what we want perhaps: */
;
WITH cte AS (
  SELECT
    Amount,
    CAST(LEFT([Date], 8) AS DATE) AS MyDate,
    ROW_NUMBER() OVER(PARTITION BY CAST(LEFT([Date], 8) AS DATE) ORDER BY CAST(LEFT([Date], 8) AS DATE) DESC) AS row_number
  FROM @mytable
)
SELECT
  *
FROM cte
WHERE row_number = 1;

Upvotes: 0

kamster
kamster

Reputation: 144

try something to affect

SELECT t.yyyy_mm_dd_date, table.*
FROM table
JOIN (
  SELECT SUBSTRING (Date, 1, 8) as yyyy_mm_dd_date, MIN(Date) as min_date
  FROM table
) t
ON t.min_date = table.Date

in general i find SQL queries to run fast with joins and aggregations (especially over their indices), so if can translate query to use those believe generally should run fairly fast

Upvotes: 1

Related Questions