Reputation: 439
I have an SQL query that reads
SELECT DISTINCT [NR] AS K_ID
FROM [DB].[staging].[TABLE]
WHERE [N]=1 and [O]='XXX' and [TYPE] in ('1_P', '2_I')
Since I'm saving the result in a CSV file (via Python Pandas) which is under version control I've noticed that the order of the result changes every time I run the query. In order to eliminate the Python part here I ran the query in MS SQL Server Management Studio, where I'm also observing a different order with every attempt.
It doesn't matter in my case, but: Is it correct, that the result of the query can be ordered differently with every execution? And if so, is there a way to make the order "deterministic"?
Upvotes: 0
Views: 609
Reputation: 415880
SQL database are based on a relational algebra set theory concept, where what you think of as tables are more formally called unordered relations. Unless you specify an ORDER BY
, the database is free to return the data is whatever order is convenient.
This order might match an index, rather than the order on disk. It might also start in the middle of the data, if the database can take advantage of work already in progress for another query to reduce total reads between the two (Enterprise Edition will do this).
Worse, even the order on disk might change. If there's no primary key, the database can even move a page around to help things run more efficiently.
In other words, if the order matters (and it usually does), specify an ORDER BY
clause.
Upvotes: 2
Reputation: 1269953
SQL queries return results as an unordered set, unless the outermost query has an order by
.
On smaller amounts of data, the results look repeatable. However, on larger systems -- and particularly on parallel systems -- the ordering may be based on hashing algorithms, when nodes complete, and congestion on the network (among other factors). So, you can in fact see different orderings each time you run.
Upvotes: 1