Reputation: 195

Correct format for Select in SQL Server

I have what should be a simple query for any database and which always runs in MySQL but not in SQL Server

select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
group by assetid
order by ts desc

The error is:

column tagalerts.id is invalid in the select list because it is not contained in either an aggregate function or the group by clause.

It is not a simple case of adding tagalerts.id to the group by clause because the error repeats for ts and for assetid etc, implying that all the selects need to be in a group or in aggregate functions... either of which will result in a meaningless and inaccurate result.

Splitting the select into a subquery to sort and group correctly (which again works fine with MySQL, as you would expect) makes matters worse

SELECT * from
(select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
order by ts desc 
)T1
group by assetid

the order by clause is invalid in views, inline functions, derived tables and expressions unless TOP etc is used

the 'correct output' should be

id     ts                 assetid     zonename     battlevel
1234   a datetime         1569        Reception    0
3182   another datetime   1572        Reception    0

Either I am reading SQL Server's rules entirely wrong or this is a major flaw with that database.

How can I write this to work on both systems?

Upvotes: 1

Answers (2)

Joel Coehoorn

Reputation: 416039

I strongly suspect this query is WRONG even in MySql.

We're missing a lot of details (sample data, and we don't know which table all of the columns belong to), but what I do know is you're grouping by assetid, where it looks like one assetid value could have more than one ts (timestamp) value in the group. It also looks like you're counting on the order by ts desc to ensure both that you see recent timestamps in the results first and that each assetid group uses the most recent possible ts timestamp for that group.

MySql only guarantees the former, not the latter. Nothing in this query guarantees that each assetid is using the most recent timestamp available. You could be seeing the wrong timestamps, and then also using those wrong timestamps for the order by. This is the problem the Sql Server rule is there to stop. MySql violates the SQL standard to allow you to write that wrong query.

Instead, you need to look at each column and either add it to the group by (best when all of the values are known to be the same, anyway) or wrap it in an aggregrate function like MAX(), MIN(), AVG(), etc, so there is a deterministic result for which value from the group is used.

If all of the values for a column in a group are the same, then there's no problem adding it to the group by. If the values are different, you want to be precise about which one is chosen for the result set.

While I'm here, the tagalerts, node join syntax has been obsolete for more than 20 years now. It's also good practice to use an alias with every table and prefix every column with the alias. I mention these to explain why I changed it for my code sample below, though I only prefix columns where I am confident in which table the column belongs to.

This query should run on both databases:

SELECT ta.assetid, MAX(ta.id) "id", MAX(ta.ts) "ts",
    MAX(n.zonename) "zonename", MAX(battlevel) "battlevel"
FROM tagalerts ta
INNER JOIN node n ON ta.nodeid = n.id
WHERE ack = 0
GROUP BY ta.assetid
ORDER BY ts DESC

There is also a concern here the results may be choosing values from different records in the joined node table. So if battlevel is part of the node table, you might see a result that matches a zonename with a battlevel that never occurs in any record in the data. In Sql Server, this is easily fixed by using APPLY to match only one node record to each tagalert. MySql doesn't support this (APPLY or an equivalent has been in every other major database since at least 2012), but you can simulate with it in this case with two JOINs, where the first join is a subquery that uses GROUP BY to determine values will uniquely identify the needed node record, and second join is to the node table to actually produce that record. Unfortunately, we need to know more about the tables in question to actually write this code for you.

Upvotes: 1

LukStorms

Reputation: 29667

In most databases you can't just include columns that aren't in the GROUP BY without using an aggregate function.

MySql is an exception to that. But MS SQL Server isn't.

So you could keep that GROUP BY with only the "assetid".
But then use the appropriate aggregate functions for all the other columns.

Also, use the JOIN syntax for heaven's pudding sake.
A SQL like select * from table1, table2 where table1.id2 = table2.id is using a syntax from the previous century.

SELECT
MAX(node.id) AS id,
MAX(ta.ts) AS ts,
ta.assetid,
MAX(node.zonename) AS zonename,
MAX(ta.battlevel) AS battlevel
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
GROUP BY ta.assetid
ORDER BY ta.ts DESC;

Another trick to use in MS SQL Server is the window function ROW_NUMBER.
But this is probably not what you need.

Example:

SELECT id, ts, assetid, zonename, battlevel
FROM
(
  SELECT
   node.id,
   ta.ts,
   ta.assetid,
   node.zonename,
   ta.battlevel,
   ROW_NUMBER() OVER (PARTITION BY ta.assetid ORDER BY ta.ts DESC) AS rn
  FROM tagalerts AS ta
  JOIN node ON node.id = ta.nodeid
  WHERE ta.ack = 0
) q
WHERE rn = 1
ORDER BY ts DESC;

Upvotes: 2

Correct format for Select in SQL Server

Answers (2)

Related Questions