Reputation: 195
I have what should be a simple query for any database and which always runs in MySQL but not in SQL Server
select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
group by assetid
order by ts desc
The error is:
column tagalerts.id is invalid in the select list because it is not contained in either an aggregate function or the group by clause.
It is not a simple case of adding tagalerts.id
to the group by
clause because the error repeats for ts
and for assetid
etc, implying that all the selects need to be in a group or in aggregate functions... either of which will result in a meaningless and inaccurate result.
Splitting the select into a subquery to sort and group correctly (which again works fine with MySQL, as you would expect) makes matters worse
SELECT * from
(select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
order by ts desc
)T1
group by assetid
the order by clause is invalid in views, inline functions, derived tables and expressions unless TOP etc is used
the 'correct output' should be
id ts assetid zonename battlevel
1234 a datetime 1569 Reception 0
3182 another datetime 1572 Reception 0
Either I am reading SQL Server's rules entirely wrong or this is a major flaw with that database.
How can I write this to work on both systems?
Upvotes: 1
Views: 1216
Reputation: 416039
I strongly suspect this query is WRONG even in MySql.
We're missing a lot of details (sample data, and we don't know which table all of the columns belong to), but what I do know is you're grouping by assetid
, where it looks like one assetid
value could have more than one ts
(timestamp) value in the group. It also looks like you're counting on the order by ts desc
to ensure both that you see recent timestamps in the results first and that each assetid
group uses the most recent possible ts
timestamp for that group.
MySql only guarantees the former, not the latter. Nothing in this query guarantees that each assetid
is using the most recent timestamp available. You could be seeing the wrong timestamps, and then also using those wrong timestamps for the order by
. This is the problem the Sql Server rule is there to stop. MySql violates the SQL standard to allow you to write that wrong query.
Instead, you need to look at each column and either add it to the group by
(best when all of the values are known to be the same, anyway) or wrap it in an aggregrate function like MAX()
, MIN()
, AVG()
, etc, so there is a deterministic result for which value from the group is used.
If all of the values for a column in a group are the same, then there's no problem adding it to the group by
. If the values are different, you want to be precise about which one is chosen for the result set.
While I'm here, the tagalerts, node
join syntax has been obsolete for more than 20 years now. It's also good practice to use an alias with every table and prefix every column with the alias. I mention these to explain why I changed it for my code sample below, though I only prefix columns where I am confident in which table the column belongs to.
This query should run on both databases:
SELECT ta.assetid, MAX(ta.id) "id", MAX(ta.ts) "ts",
MAX(n.zonename) "zonename", MAX(battlevel) "battlevel"
FROM tagalerts ta
INNER JOIN node n ON ta.nodeid = n.id
WHERE ack = 0
GROUP BY ta.assetid
ORDER BY ts DESC
There is also a concern here the results may be choosing values from different records in the joined node
table. So if battlevel
is part of the node
table, you might see a result that matches a zonename
with a battlevel
that never occurs in any record in the data. In Sql Server, this is easily fixed by using APPLY
to match only one node
record to each tagalert
. MySql doesn't support this (APPLY or an equivalent has been in every other major database since at least 2012), but you can simulate with it in this case with two JOINs, where the first join is a subquery that uses GROUP BY to determine values will uniquely identify the needed node
record, and second join is to the node
table to actually produce that record. Unfortunately, we need to know more about the tables in question to actually write this code for you.
Upvotes: 1
Reputation: 29667
In most databases you can't just include columns that aren't in the GROUP BY
without using an aggregate function.
MySql is an exception to that. But MS SQL Server isn't.
So you could keep that GROUP BY
with only the "assetid".
But then use the appropriate aggregate functions for all the other columns.
Also, use the JOIN syntax for heaven's pudding sake.
A SQL like select * from table1, table2 where table1.id2 = table2.id
is using a syntax from the previous century.
SELECT
MAX(node.id) AS id,
MAX(ta.ts) AS ts,
ta.assetid,
MAX(node.zonename) AS zonename,
MAX(ta.battlevel) AS battlevel
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
GROUP BY ta.assetid
ORDER BY ta.ts DESC;
Another trick to use in MS SQL Server is the window function ROW_NUMBER.
But this is probably not what you need.
Example:
SELECT id, ts, assetid, zonename, battlevel
FROM
(
SELECT
node.id,
ta.ts,
ta.assetid,
node.zonename,
ta.battlevel,
ROW_NUMBER() OVER (PARTITION BY ta.assetid ORDER BY ta.ts DESC) AS rn
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
) q
WHERE rn = 1
ORDER BY ts DESC;
Upvotes: 2