Deterministic/Volatile function in SQL

Question

Let's take a basic deterministic function and a non-deterministic one:

ABS(2)
NOW()

What about the third case of something that may change but we're not sure, such as:

SELECT
    ABS(2) -- deterministic
  , NOW()  -- not
  , getTableCount(otherTbl) -- function that does a 'SELECT count(1) FROM table'
FROM
    table

Basically, if a row is inserted or deleted, the subselect's value will change. So would that one be considered deterministic? The result should always be the same...unless the underlying data is changed, so it's almost like a third case. Or, is volatile/non-deterministic just taken to mean 'if it ever changes, ever, ever, ever, under any circumstances, then it's volatile.' ?

brunoff · Accepted Answer

There are different interpretations for determinism, even when restricted to the SQL functions domain. It depends on what determinism consumer needs and assumes.

The usual definition of determinism is that a deterministic function always return the same value when confronted with same input argument values for its parameters.

If the function consumes state, it would implictly consider it as an extra input paramenter. The original function(p1,...pn) would become function(p1,...pn,state). But in this case if two different states are compared, then the inputs would not be the same, so we couldn't talk about determinism anymore. Knowing this, we will use the terms state-sensitive-determinism and state-insensitive-determinism to differentiate those cases.

Our state-insensitive-determinism is equivalent of PostgreSQL's IMMUTABLE (PostgreSQL is a good comparinson as it avoids using the term determinism to avoid confusion, as it is possible to see in postgresql docs). In this case, the function always returns the same value no matter the state (example select 1+2). It is the most strict form of determinism and consumers usually take it for granted - query optimizers for example can substitute them by their result (select 1+2 would become select 3). In those cases, the state does not influence the result. So, even if we put state as an extra parameter, the function remains resulting the same.

When the result does not change facing the same state but risk changing otherwise we have our state-sensitive-determinism or PostgreSQL's STABLE (example select v,sum(v) over () from tbl where v>1000;). Determinism here is on a gray area. A query optimizer consumer sees it as deterministic because since query lives a well defined state, at least in transactionable databases, it is fine to calculate it only once instead of many times because future calculations would result the same. But a materialized calculated column or index can't accept this same function as deterministic because a little change in the state would turn invalid all its pre-calculated and stored values. In this scenario resides the OP's getTableCount(otherTbl). For a query optimizer its deterministism is enough to avoid extra calculations, for materialized calculated values it is not enough and can't be accepted as a source of value for being written. If we use the state as an extra parameter, the result may change between different states.

If we consume a value that is generated based on some uncontrolled state like random() (at least when we don't choose seed and pseudorandom function), then we can't achieve determinism. In PostgreSQL's terms, this would be VOLATILE. A VOLATILE is undeterministic by nature because it can have different values even in the same table scan, as it is the case of random() (For time related functions see Postgres now() timestamp doesn't change, when script works, the time may be the transaction time or can be the query time, what would impact your view of what is deterministic).

MySQL have different keywords, NOT DETERMINISTIC DETERMINISTIC, READS SQL DATA MODIFIES SQL DATA (similiar to PostgreSQL's LEAKPROOF), NO SQL CONTAINS SQL as seen on mysql docs, with the same objective of PostgreSQL - giving hints to the specific consumer, be it a query optimizer or a materialized value, of whether it would adapt its behaviour depending on its interpretation of determinism. The database vendors probably leave this responsibility to the users because leaving them the responsibility of determining the causal graph what influences what would be complex and problematic.

When vendors talk about determinsim they will probably be talking about one of those that we said. In sqlserver docs microsoft says that state must be the same, so they are probably talking about our state-sensitive-determinism. In sqlite docs otherwise it is taken the state-insensitive-determinism approach, where functions that must result equally even in different states to be considered deterministic, because they would follow stricter rules. Oracle implicitly follows the same sqlite flavor in their docs.

Our transactionable databases will eventually use some mechanism like MVCC to hold state in a transaction. In this case we could think the transactionTimestamp as a input to our functions. But if we take more complex cases like distributed databases, then our determinism can be harder to achieve and eventualy it would have to consider consensus algorithms.

Deterministic/Volatile function in SQL

Answers (1)

Related Questions