Thorsten Niehues
Thorsten Niehues

Reputation: 14472

SAP HANA partitioning with col=null partition

In SAP HANA 1.0 SPS 12 we want to partition a table by ValidationAreaID and by VersionValidTo.

This is no problem so far.

But since a comparison with null is supposed to be faster than a timestamp I want to partition by

} technical configuration {
    partition by 
        range (ValidationAreaID) (
            partition value = 1,
            partition value = 2,
            partition value = 3,
            partition others
        ),
        range (VersionValidTo)  (
            partition value = null,
            partition others
        )
    ;

instead of

} technical configuration {
    partition by 
        range (ValidationAreaID) (
            partition value = 1,
            partition value = 2,
            partition value = 3,
            partition others
        ),
        range (VersionValidTo)  (
            partition value = '9999-12-31',
            partition others
        )
    ;

However trying to partition by a null value results in the error message :
Syntax error: unexpected token "null"

Upvotes: 0

Views: 443

Answers (1)

Lars Br.
Lars Br.

Reputation: 10396

To provide a closable answer: The partition definition clauses don't allow an IS NULL check. The partition either needs to be specified by one distinct and uniquely identifiable string or (unsigned) numeral or by a closed range of values (see here).

This answers the part whether it's possible to create a partition for records where the condition IS NULL evaluated to true: it's not.

The second part of the answer addresses the claim that a check for IS NULL is faster than a check for a specific value. This is not generally true. While you may find data distributions for which checking for NULL entries in a specific column can be done quicker than scanning the entire main segment of that column, this is not something special of the NULL entry. Depending on the overall distribution of distinct values in any column (and across all columns in the table), SAP HANA will sort and compress the value ID pointers in the main segment of the column. If e.g. the majority of all entries in a column is currently NULL it may well result in a compression that puts all NULL entries at the very top and compresses those with RLE. A general search for IS NULL would be very fast in this case. Likewise, the compression could change for other very prominent values of that column.

The only technical difference in the column store for NULLs, that I'm aware of, is that they have a hard-coded and fixed value ID so the lookup into the dictionary can be avoided and all columns share the same value ID for NULL entries. As the dictionary lookup is usually not the bottleneck in statement execution, it's fair to say that the "NULL is faster" idea is not true.

Upvotes: 1

Related Questions