Simon Warta
Simon Warta

Reputation: 11418

Is 'length() IS NULL' equivalent and faster than 'IS NULL' for BLOBs?

I am having a ~90 MB SQLite database on a SSD consisting mostly on message attachments including a BLOB column content, that stores the binary attachment data.

Now I found that the following query

SELECT message_id FROM attachments WHERE length(content) IS NULL;

is 500x faster (0.5ms vs. 250ms) than the original

SELECT message_id FROM attachments WHERE content IS NULL;

Is it true, that both queries are equivalent?

Additional info

  1. No indexes are involved apart from the autoindex.
  2. It's not caching. The result can be reproducued unlimited times in any order from any number of SQLite processes.

Upvotes: 7

Views: 1625

Answers (3)

Colonel Thirty Two
Colonel Thirty Two

Reputation: 26599

I made a script to benchmark both functions. length(x) IS NULL is faster unless you have mostly NULL values.

Results:

  • 50% alternating between random data and null:
    • IS NULL: 11.343180236999842
    • length(x) IS NULL: 7.824154090999855
  • Entirely blobs, no nulls:
    • IS NULL: 15.019244787999924
    • length(x) IS NULL: 7.527420233999919
  • Entirely nulls, no blobs:
    • IS NULL: 6.184766045999822
    • length(x) IS NULL: 6.448342310000044

Test script:

import sqlite3
import timeit

conn = sqlite3.connect("test.db")
c = conn.cursor()

c.execute("DROP TABLE IF EXISTS test")
c.execute("CREATE TABLE test (data BLOB)")

for i in range(10000):
    # Modify this to change data
    if i % 2 == 0:
        c.execute("INSERT INTO test(data) VALUES (randomblob(1024))")
    else:
        c.execute("INSERT INTO test(data) VALUES (NULL)")

def timeit_isnull():
    c.execute("SELECT data IS NULL AS dataisnull FROM test")
    c.fetchall()

def timeit_lenisnull():
    c.execute("SELECT length(data) IS NULL AS dataisnull FROM test")
    c.fetchall()

print(timeit.timeit(timeit_isnull, number=1000))
print(timeit.timeit(timeit_lenisnull, number=1000))

Upvotes: 3

CL.
CL.

Reputation: 180172

In SQLite, the length and type of each column value are stored at the beginning of the row. This allows the length() and typeof() functions to be optimized to avoid loading the actual value.

The IS NULL operator has no such optimization (although it would be possible to implement it).

Upvotes: 4

Bud Damyanov
Bud Damyanov

Reputation: 31889

Actually using LENGTH(some_blob_content) instructs MySQL server 5.5 and above to bypass row-scanning, this is the reason for improved performance of your query, because the data is read directly from meta-data table.

EDIT:

In SQLite during part INSERT and SELECT processing, the complete content of each row in the database is encoded as a single BLOB. See here for details.

Also for a string value X, the length(X) function returns the number of characters (not bytes) in X prior to the first NULL character. Since SQLite strings do not normally contain NULL characters, the length(X) function will usually return the total number of characters in the string X. For a blob value X, length(X) returns the number of bytes in the blob. If X is NULL then length(X) is NULL. If X is numeric then length(X) returns the length of a string representation of X.

The above quote from the documentation suggests that, for blob values, the procedure of reading their length is the same as in MySQL, i.e. from meta-data table.

Upvotes: 1

Related Questions