Borv
Borv

Reputation: 643

Ignite Query / SqlQuery / SqlFieldsQuery: are "lazy" and "page size" actually doing anything?

Some background: we are building a machine learning farm off Ignite cluster. Part of the use case is generating training data sets, which are gigantic matrices (up to billions of rows x thousands of columns in theory), one row per data entry in Ignite cache.

We are using SqlQuery to fetch the records matching a predicate locally on each node, then iterate over records, generate vectors, write them into an external storage for further consumption. Each node exports data independently, so for 32 nodes we end up with 32 exported data sets.

The problem: small data set generations worked OK, but larger data set generations (queries expected to return 10M+ rows per node) basically kill the entire cluster, blowing out the nodes due to OOME and GC hell. We looked at "Performance and Debugging" section of Ignite docs (https://apacheignite-sql.readme.io/docs/performance-and-debugging#result-set-lazy-load), tried lazy result set and page size settings. Nope.

The investigations (profiling, memory dumps, debugger, etc), suggested that result sets of the queries are loaded into memory completely before our code gets to read the first row, even though we are using QueryCursor and iterations. The Query#pageSize and SqlFieldsQuery#setLazy do not seem to have any effect on that whatsoever. Digging deeper, it turned out that H2 (that Ignite indexing is using) does not support server-side cursor in result sets at all, and can only be configured to buffer the records onto disk (even with SSDs, this is a performance non-starter). Reading through Ignite source code suggested that Query#pageSize and SqlFieldsQuery#setLazy are only used in distributed queries / scan queries, and still Ignite does full reads of result sets on the nodes into memory.

Sigh. These are remediations we thought of:

The question: is our assessment correct and query result streaming is not a thing for local queries? Are there any more sensible workarounds?

Would appreciate some insight from Ignite comrades if they read this. Thanks in advance!

UPDATE: in case it matters, the cluster is 8 nodes as follows:

Architecture: x86_64 CPU(s): 32 Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz CPU MHz: 2345.904 BogoMIPS: 4600.05 Hypervisor vendor: Xen Virtualization type: full RAM: 240Gb

These are designated EC2-s with ephemeral volumes for Ignite /data mount.

Upvotes: 2

Views: 935

Answers (2)

Borv
Borv

Reputation: 643

Updating this after seemingly solving the issue.

Note: the question was about local queries.

It looks like using SqlFieldsQuery with 'lazy' helped for local queries. Local SqlQuery seems to be ignoring the lazy flag.

Upvotes: 0

alamar
alamar

Reputation: 19331

lazy does definitely do anything: for SELECT * FROM cache query, it's the difference between cluster failure and normal operation. page size should be working together with lazy.

Can you show your query? Also, I'm not sure that it really applies to local queries.

Upvotes: 0

Related Questions