Reputation: 14472
In SAP HANA 1.0 SPS 12 we want to partition a table by ValidationAreaID and by VersionValidTo.
This is no problem so far.
But since a comparison with null is supposed to be faster than a timestamp I want to partition by
} technical configuration {
partition by
range (ValidationAreaID) (
partition value = 1,
partition value = 2,
partition value = 3,
partition others
),
range (VersionValidTo) (
partition value = null,
partition others
)
;
instead of
} technical configuration {
partition by
range (ValidationAreaID) (
partition value = 1,
partition value = 2,
partition value = 3,
partition others
),
range (VersionValidTo) (
partition value = '9999-12-31',
partition others
)
;
However trying to partition by a null value results in the error message : Syntax error: unexpected token "null"
Upvotes: 0
Views: 443
Reputation: 10396
To provide a closable answer:
The partition definition clauses don't allow an IS NULL
check.
The partition either needs to be specified by one distinct and uniquely identifiable string or (unsigned) numeral or by a closed range of values (see here).
This answers the part whether it's possible to create a partition for records where the condition IS NULL
evaluated to true: it's not.
The second part of the answer addresses the claim that a check for IS NULL
is faster than a check for a specific value.
This is not generally true. While you may find data distributions for which checking for NULL
entries in a specific column can be done quicker than scanning the entire main segment of that column, this is not something special of the NULL
entry.
Depending on the overall distribution of distinct values in any column (and across all columns in the table), SAP HANA will sort and compress the value ID pointers in the main segment of the column.
If e.g. the majority of all entries in a column is currently NULL
it may well result in a compression that puts all NULL
entries at the very top and compresses those with RLE.
A general search for IS NULL
would be very fast in this case.
Likewise, the compression could change for other very prominent values of that column.
The only technical difference in the column store for NULL
s, that I'm aware of, is that they have a hard-coded and fixed value ID so the lookup into the dictionary can be avoided and all columns share the same value ID for NULL
entries. As the dictionary lookup is usually not the bottleneck in statement execution, it's fair to say that the "NULL is faster" idea is not true.
Upvotes: 1