furgil99
furgil99

Reputation: 1

GROUP BY Not Recognizing Aggregate Function

I have a Table with records like this:

Id ForeignId Date Created Status
1 ZZ01 2021-01-20 failed
2 ZZ02 2021-03-24 passed
3 ZZ01 2021-08-09 passed
4 ZZ03 2022-01-12 failed
5 ZZ01 2022-04-23 passed

I am trying to write a query that returns the latest DateCreated and Status of every Distinct ForeignId in the table above. The query I have been writing uses MAX() to find the latest date of a record, and uses FIRST_VALUE() on Status to get only the latest value as well, and later using GROUP BY all the columns I put in the SELECT except for . The problem is that I I keep getting this error:

Column 'dbo.T.DateSubmitted' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Now I don't understand why this error keeps popping up when I am using all my Selects in the Group By or in the case of the failing DateSubmitted in an Aggregate Function.

SELECT [T].[Id],
       [T].[ForeignId],
       MAX([T].[DateSubmitted]) AS [Date Submitted],
       FIRST_VALUE([S].[Status]) OVER (ORDER BY [T].[DateSubmitted] DESC) AS [Status]
      FROM Table as [T]

GROUP BY [T].[Id], [T].[ForeignId], [T].[Status]
ORDER BY [T].[ForeignId];

As you can see from my code above I am using the MAX() function for my DateSubmitted select, and all my leftover selects in the GROUP BY, what am I missing? Why do I need to include my DateSubmitted in the GROUP BY when I am only selecting the MAX() value?

What I want is to return only the latest date and status for each distinct ForeignId, since every ForeignId can have multiple Status and Dates I want only the latest values, I almost get it with the query above, but if I am unable to use the GROUP BY and MAX() I can get the latest column information I have repeated ForeignId instances. For example I would receive ForeignId ZZ01 3 times instead of once. Which is why I need the GROUP BY to work. An example of the bad output when unable to use GROUP BY:

Id ForeignId Date Created Status
1 ZZ01 2021-04-23 passed
2 ZZ01 2021-04-23 passed
3 ZZ01 2021-04-23 passed
4 ZZ02 2022-03-24 passed
5 ZZ03 2022-01-12 failed

Expected Result:

Id ForeignId Date Created Status
1 ZZ01 2021-04-23 passed
2 ZZ02 2022-03-24 passed
3 ZZ03 2022-01-12 failed

Upvotes: 0

Views: 1029

Answers (1)

Charlieface
Charlieface

Reputation: 71263

Unfortunately, FIRST_VALUE is not available as an aggregate function, only as a window function. Therefore, the compiler understands it as operating over the resultset after aggregating. So the columns referred to must be in the GROUP BY, but cannot be on non-aggregated and non-grouped columns.

You can use it over an aggregate function

SELECT [SLM].[Id],
       [SLM].[CompanySiteId],
       MAX([QF].[DateSubmitted]) AS [Date Submitted],
       FIRST_VALUE([QF].[Status]) OVER (ORDER BY MAX([QF].[DateSubmitted]) DESC) AS [QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
    JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
        ON [SAS].[SiteListMemberId] = [SLM].[Id]
    JOIN [dbo].[QCForm] AS [QF]
        ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];

This may not give the results you want, it's hard to say without sample data.

Or you need to push it down into a derived table (subquery). You can do this over the whole joined set

SELECT t.[Id],
       t.[CompanySiteId],
       MAX(t.[DateSubmitted]) AS [Date Submitted],
       t.[QC Status]
FROM (
    SELECT [SLM].[Id],
           [SLM].[CompanySiteId],
           [QF].[DateSubmitted] AS [Date Submitted],
           FIRST_VALUE([QF].[Status]) OVER (ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
    FROM [dbo].[SiteListMember] AS [SLM]
    JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
        ON [SAS].[SiteListMemberId] = [SLM].[Id]
    JOIN [dbo].[QCForm] AS [QF]
        ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
    WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
) t
GROUP BY t.[Id], t.[CompanySiteId], t.[Status]
ORDER BY t.[CompanySiteId];

Or you can do it just over the one table and join it after, by adding a PARTITION BY clause.

SELECT [SLM].[Id],
       [SLM].[CompanySiteId],
       MAX([QF].[DateSubmitted]) AS [Date Submitted],
       QC.[QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
    JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
        ON [SAS].[SiteListMemberId] = [SLM].[Id]
    JOIN (
        SELECT *,
          FIRST_VALUE([QF].[Status]) OVER (PARTITION BY [QF].[SiteAssessmentStagingId]
              ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
        FROM [dbo].[QCForm] AS [QF]
      ) ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];

I would advise you to only quote column and table names which need it, and to avoid such names if at all possible. Lots of [] is annoying to read.

Upvotes: 1

Related Questions