Reputation: 191
I am the end-user of a SQL Server DB with multiple lines ordered by date. Lets take this DB as an example:
Amount | Date |
---|---|
23.5 | 20210512010220111 |
24 | 20210512020220111 |
30 | 20210512030220111 |
1.2 | 20210513011020111 |
1000 | 20210513020220111 |
24 | 20210514100220111 |
240 | 20210514100220111 |
Be advised that the date is just a long that represent the date in the format: yyyymmddhhMMssfff.
I am trying to create a SQL query like this: "Get the first row of each day" So for the above example the result will be:
Amount | Date |
---|---|
23.5 | 20210512010220111 |
1.2 | 20210513011020111 |
24 | 20210514100220111 |
I saw this example in multiple sources: https://learnsql.com/cookbook/how-to-select-the-first-row-in-each-group-by-group/
The problem is when I tried it that was way too slow for me the DB is storing hundreds of millions of rows (with 9 columns each)
A couple of weeks ago I used a similar(ish) query for a daily min, avg, max:
SELECT MIN(Amount), AVG(Amount), MAX(Amount)
FROM table
GROUP BY Date/1000000000
That worked quickly enough, if there is something similar to FIRST(Amount) that would be great. Just to clarify, I am just an end-user, I have no saying over the overall structure of the DB.
Edit: This is the query I tried and was too slow:
WITH added_row_number AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY Date/1000000000 ORDER BY Date ASC) AS row_number
FROM table
)
SELECT
*
FROM added_row_number
WHERE row_number = 1;
EDIT 2:
I took inspiration from all the answers here and after some trial and error I found this query worked fast enough (adjusted query to suit the example DB, not the actual query I used):
SELECT OrgTable.*
FROM (
SELECT *
FROM Table
-- WHERE statement on a uniq key column
) As OrgTable
INNER JOIN (
SELECT MIN(Date) As ModTimeMin
FROM Table
-- WHERE statement on a uniq key column
GROUP BY DATE/1000000000
--this sub query gets the time of the first transaction of each day
) As MinTable
ON OrgTable.Date = MinTable.Date --this joins the relevant data to the times table
ORDER BY OrgTable.Date ASC
Thank you.
Upvotes: 0
Views: 3247
Reputation: 21
The following may perform better than your original query as they use an existance check (or top clause) instead of reading all data, calculating row_number for all rows, then scanning the result. It will perform best if there is already an index on the [Date] column.
As you have duplicate [Date] values, the query could return different results each time it is executed unless a unique key column is included in the query.
create table #t(
[Id] int
,[Amount] decimal(10,2)
,[Date] bigint
);
create index idx_Date on #t([Date]);
insert #t values
(1, 23.5, 20210512010220111)
,(2, 24, 20210512020220111)
,(3, 30, 20210512030220111)
,(4, 1.2, 20210513011020111)
,(5, 1000, 20210513020220111)
,(6, 24, 20210514100220111)
,(7, 240, 20210514100220111);
-- Assuming that you have a unique key available
select
*
,t1.[Date]/1000000000
from #t t1
where not exists (
select *
from #t t2
where t1.[Date]/1000000000 = t2.[Date]/1000000000
and (
t2.[Date] < t1.[Date]
or (
t2.[Date] = t1.[Date]
and t2.Id < t1.Id
)
)
);
--This is a kludge if you don't have a unique key available and may perform worse than your original query. Don't use this without testing it in a non production system first.
select
*
,t1.[Date]/1000000000
from #t t1
where not exists (
select *
from #t t2
where t1.[Date]/1000000000 = t2.[Date]/1000000000
and (
t2.[Date] < t1.[Date]
or (
t2.[Date] = t1.[Date]
and t2.%%physloc%% < t1.%%physloc%% -- %%physloc%% is the File/Page/Slot for the row
)
)
);
-- Alternatively using top. Assumes a unique column is available
select
t1.*
,t1.[Date]/1000000000
from #t t1
cross apply (
select top 1 *
from #t t2
where t1.[Date]/1000000000 = t2.[Date]/1000000000
order by Date, Id
) t2
where t1.Id = t2.Id
drop table #t;
Upvotes: 1
Reputation: 34168
Feels like you can use a cte here and cast the date strings as actual DATE values (the last query) here is my sample using test data (you did not post the column types so I guessed)
Not 100% clear about the "date" column if it is an actual datetime, you can just cast as a date
DECLARE @mytable TABLE (
Amount NUMERIC(10,2) NOT NULL,
[Date] VARCHAR(30) NOT NULL
);
INSERT INTO @mytable(Amount,[Date])
VALUES
(3.5, '20210512010220111'),
(24.0, '20210512020220111'),
(30.0,'20210512030220111'),
(1.2, '20210513011020111'),
(1000.0, '20210513020220111'),
(24.0, '20210514100220111'),
(240.0, '20210514100220111')
;
SELECT
[Amount],
MAX(CAST( LEFT([Date], 8) AS DATE)) AS NewDate
FROM @mytable
GROUP BY AMOUNT
ORDER BY MAX(CAST( LEFT([Date], 8) AS DATE)) DESC;
/* this is what we want perhaps: */
;
WITH cte AS (
SELECT
Amount,
CAST(LEFT([Date], 8) AS DATE) AS MyDate,
ROW_NUMBER() OVER(PARTITION BY CAST(LEFT([Date], 8) AS DATE) ORDER BY CAST(LEFT([Date], 8) AS DATE) DESC) AS row_number
FROM @mytable
)
SELECT
*
FROM cte
WHERE row_number = 1;
Upvotes: 0
Reputation: 144
try something to affect
SELECT t.yyyy_mm_dd_date, table.*
FROM table
JOIN (
SELECT SUBSTRING (Date, 1, 8) as yyyy_mm_dd_date, MIN(Date) as min_date
FROM table
) t
ON t.min_date = table.Date
in general i find SQL queries to run fast with joins and aggregations (especially over their indices), so if can translate query to use those believe generally should run fairly fast
Upvotes: 1