Reputation: 1
I have a Table with records like this:
Id | ForeignId | Date Created | Status |
---|---|---|---|
1 | ZZ01 | 2021-01-20 | failed |
2 | ZZ02 | 2021-03-24 | passed |
3 | ZZ01 | 2021-08-09 | passed |
4 | ZZ03 | 2022-01-12 | failed |
5 | ZZ01 | 2022-04-23 | passed |
I am trying to write a query that returns the latest DateCreated and Status of every Distinct ForeignId in the table above. The query I have been writing uses MAX() to find the latest date of a record, and uses FIRST_VALUE() on Status to get only the latest value as well, and later using GROUP BY all the columns I put in the SELECT except for . The problem is that I I keep getting this error:
Column 'dbo.T.DateSubmitted' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Now I don't understand why this error keeps popping up when I am using all my Selects in the Group By or in the case of the failing DateSubmitted in an Aggregate Function.
SELECT [T].[Id],
[T].[ForeignId],
MAX([T].[DateSubmitted]) AS [Date Submitted],
FIRST_VALUE([S].[Status]) OVER (ORDER BY [T].[DateSubmitted] DESC) AS [Status]
FROM Table as [T]
GROUP BY [T].[Id], [T].[ForeignId], [T].[Status]
ORDER BY [T].[ForeignId];
As you can see from my code above I am using the MAX() function for my DateSubmitted select, and all my leftover selects in the GROUP BY, what am I missing? Why do I need to include my DateSubmitted in the GROUP BY when I am only selecting the MAX() value?
What I want is to return only the latest date and status for each distinct ForeignId, since every ForeignId can have multiple Status and Dates I want only the latest values, I almost get it with the query above, but if I am unable to use the GROUP BY and MAX() I can get the latest column information I have repeated ForeignId instances. For example I would receive ForeignId ZZ01 3 times instead of once. Which is why I need the GROUP BY to work. An example of the bad output when unable to use GROUP BY:
Id | ForeignId | Date Created | Status |
---|---|---|---|
1 | ZZ01 | 2021-04-23 | passed |
2 | ZZ01 | 2021-04-23 | passed |
3 | ZZ01 | 2021-04-23 | passed |
4 | ZZ02 | 2022-03-24 | passed |
5 | ZZ03 | 2022-01-12 | failed |
Expected Result:
Id | ForeignId | Date Created | Status |
---|---|---|---|
1 | ZZ01 | 2021-04-23 | passed |
2 | ZZ02 | 2022-03-24 | passed |
3 | ZZ03 | 2022-01-12 | failed |
Upvotes: 0
Views: 1029
Reputation: 71263
Unfortunately, FIRST_VALUE
is not available as an aggregate function, only as a window function. Therefore, the compiler understands it as operating over the resultset after aggregating. So the columns referred to must be in the GROUP BY
, but cannot be on non-aggregated and non-grouped columns.
You can use it over an aggregate function
SELECT [SLM].[Id],
[SLM].[CompanySiteId],
MAX([QF].[DateSubmitted]) AS [Date Submitted],
FIRST_VALUE([QF].[Status]) OVER (ORDER BY MAX([QF].[DateSubmitted]) DESC) AS [QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
ON [SAS].[SiteListMemberId] = [SLM].[Id]
JOIN [dbo].[QCForm] AS [QF]
ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];
This may not give the results you want, it's hard to say without sample data.
Or you need to push it down into a derived table (subquery). You can do this over the whole joined set
SELECT t.[Id],
t.[CompanySiteId],
MAX(t.[DateSubmitted]) AS [Date Submitted],
t.[QC Status]
FROM (
SELECT [SLM].[Id],
[SLM].[CompanySiteId],
[QF].[DateSubmitted] AS [Date Submitted],
FIRST_VALUE([QF].[Status]) OVER (ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
ON [SAS].[SiteListMemberId] = [SLM].[Id]
JOIN [dbo].[QCForm] AS [QF]
ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
) t
GROUP BY t.[Id], t.[CompanySiteId], t.[Status]
ORDER BY t.[CompanySiteId];
Or you can do it just over the one table and join it after, by adding a PARTITION BY
clause.
SELECT [SLM].[Id],
[SLM].[CompanySiteId],
MAX([QF].[DateSubmitted]) AS [Date Submitted],
QC.[QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
ON [SAS].[SiteListMemberId] = [SLM].[Id]
JOIN (
SELECT *,
FIRST_VALUE([QF].[Status]) OVER (PARTITION BY [QF].[SiteAssessmentStagingId]
ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
FROM [dbo].[QCForm] AS [QF]
) ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];
I would advise you to only quote column and table names which need it, and to avoid such names if at all possible. Lots of
[]
is annoying to read.
Upvotes: 1