How to improve query performance on a single node in MonetDB?

Question

I've installed the latest (MonetDB 5 server v11v.27.5 "Jul2017-SP1") on Windows 2012 Server and I'm trying to query large table 1,4 billion rows in a reasonable time 2-3s.

Is this even possible with MonetDB? What could I do to improve the performance?

Detailed description of what I've done so far:

Created table:

CREATE TABLE t939ba ( id INT, xa INT, xb INT, ya INT, yb INT, a1 TINYINT, a2 TINYINT, a3 TINYINT, a4 TINYINT, a5 TINYINT, a6 TINYINT, a7 TINYINT, a8 TINYINT, a9 TINYINT);

Loaded the data:

COPY 1450000000 OFFSET 2 RECORDS INTO tbl FROM 'D:\es_export\file.csv'
USING DELIMITERS ',' NULL AS '' LOCKED;

Run the query:

SELECT COUNT(DISTINCT id) FROM tbl WHERE a1=22
AND xb>=143455 AND yb>=90911 AND xa<=143615 AND ya<=91007
AND a2 IN (2, 3, 4) AND a3 IN (0, 1, 2, 3, 4) AND a4 IN (0, 1, 2)
AND a5 IN (-1, 1, 2, 3, 4, 5, 6, 7) AND a6 IN (-1, 11, 12, 13, 14);

When I run the query for the 1st time it took (14m 52s), 2nd run of the same query took (3m 23s), 3rd consecutive run of the same query took (14s) and a slightly rearranged query took (3m 11s).

Stefan Manegold · Accepted Answer

Tomas,

thanks for the plans and traces. I see that you used the revised query with range predicates instead of IN predicates, and that this query now runs in "a mere" ~39 s (compared to ~15 min) --- either because the range predicates are evaluated more efficiently than the IN predicates, or because, as Martin indicated, a later run of the query benefits from indexes that where automatically built by MonetDB when evaluating the first query, or because of both.

In any case, running a/each query (-version) more than once is a good idea to see the possible effect of automatically built indexes.

Further, I see that either you indeed have a 34-core machine, or your machine has "only" 2 GB RAM per core --- not too much given that you have a ~42 GB data set, where each column is ~1.5 GB to ~6 GB in size ...

Hence, the main reason for the query not running faster than ~39 s might be I/O activity due to "lack" of memory.

Best,

Stefan

ps:
You can check whether for this specific query, reducing (or even avoiding) multi-core parallelism helps to reduce I/O thrashing:
Try running your query after disabling MonetDB's "mitosis" optimizer using

set optimizer='no_mitosis_pipe';

You can re-enable full multi-core parallelism using

set optimzer='default_pipe';

Best,
Stefan

How to improve query performance on a single node in MonetDB?

Answers (2)

Related Questions