General Comparisons vs Value Comparisons

Question

Why does XQuery treat the following expressions differently?

() = 2 returns false (general Comparison)
() eq 2 returns an empty sequence (value Comparison)

Jens Erat · Accepted Answer

This effect is explained in the XQuery specifications. For XQuery 3, it is in chapter 3.7.1, Value Comparisons (highlighting added by me):

Atomization is applied to the operand. The result of this operation is called the atomized operand.

If the atomized operand is an empty sequence, the result of the value comparison is an empty sequence, and the implementation need not evaluate the other operand or apply the operator. However, an implementation may choose to evaluate the other operand in order to determine whether it raises an error.

Thus, if you're comparing two single element sequences (or scalar values, which are equal to those), you will as expected receive a true/false value:

1 eq 2 is false
2 eq 2 is true
(1) eq 2 is false
(2) eq 2 is true
(2) eq (2) is true
and so on

But, if one or both of the operands is the empty list, you will receive the empty list instead:

() eq 2 is ()
2 eq () is ()
() eq () is ()

This behavior allows you to pass-through empty sequences, which could be used as a kind of null value here. As @adamretter added in the comments, the empty sequence () has the effective boolean value of false, so even if you run something like if ( () eq 2) ..., you won't observe anything surprising.

If any of the operands contains a list of more than one element, it is a type error.

General comparison, $sequence1 = $sequence2 tests if any element in $sequence1 has an equal element in $sequence2. As this semantically already supports sequences of arbitrary length, no atomization must be applied.

Why?

The difference comes from the requirements imposed by the operators' signatures. If you compare sequences of arbitrary length in a set-based manner, there is no reason to include any special cases for empty sequences -- if an empty sequence is included, the comparison is automatically false by definition.

For the operators comparing single values, one has to consider the case where an empty sequence is passed; the decision was to not raise an error, but also return a value equal to false: the empty sequence. This allows to use the empty sequence as a kind of null value, when the value is unknown; anything compared to an unknown value can never be true, but must not (necessarily) be false. If you need to, you could check for an empty(...) result, if so, one of the values to be compared was unknown; otherwise they're simply different. In Java and other languages, a null value would have been used to achieve similar results, in Haskell there's the Data.Maybe.

General Comparisons vs Value Comparisons

Answers (1)

Why?

Related Questions