Garrett
Garrett

Reputation: 11725

Optimal index for query in OrientDB

I have the following query

SELECT FROM Post
WHERE activityLog IS NOT NULL
AND activityLog_pointerProcessed IS NULL
LIMIT 1000

The properties are defined as

CREATE PROPERTY Post.activityLog_pointerProcessed BOOLEAN
CREATE PROPERTY Post.activityLog EMBEDDEDMAP EMBEDDEDMAP

However, it seems to be running fairly slowly. I have added the following index

CREATE INDEX Post_activityLog_pointerProcessed
ON Post (activityLog, activityLog_pointerProcessed)
NOTUNIQUE_HASH_INDEX

I'm wondering if this is the optimal index for this scenario or if there's a different way to set this up. The Post class is still filling up but currently has 2 million rows. The query takes ~5-10 seconds to run.

Here's what I get when I explain it

orientdb {pumpup}> explain SELECT FROM Post WHERE activityLog IS NOT NULL AND activityLog_pointerProcessed IS NULL LIMIT 1000

Profiled command '{documentReads:378236,current:#22:378235,documentAnalyzedCompatibleClass:378236,recordReads:378236,fetchingFromTargetElapsed:337922,evaluated:378236,elapsed:337922.88,resultType:collection,resultSize:1000}' in 338.688995 sec(s): {"@type":"d","@version":0,"documentReads":378236,"current":"#22:378235","documentAnalyzedCompatibleClass":378236,"recordReads":378236,"fetchingFromTargetElapsed":337922,"evaluated":378236,"elapsed":337922.88,"resultType":"collection","resultSize":1000,"@fieldTypes":"documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,fetchingFromTargetElapsed=l,evaluated=l,elapsed=f"}

Upvotes: 1

Views: 1187

Answers (1)

Lvca
Lvca

Reputation: 9060

By reading the explain result, the index hasn't been used and 378,236 documents were scanned.

The activityLog property has been declared as an EMBEDDEDMAP, so the internal values are indexed, not the fact the collection is null. I don't think it will work, but make a try enabling null values on index:

https://github.com/orientechnologies/orientdb/wiki/Indexes#null-values

Probably the best for this case is saving another boolean value "activityLog.empty" to be indexed and retrieve your query much faster.

CREATE PROPERTY Post.activityLogEmpty BOOLEAN
CREATE INDEX Post_activityLog_pointerProcessed ON Post (activityLogEmpty, activityLog_pointerProcessed) 
                                               NOTUNIQUE_HASH_INDEX

On creation set the property "activityLogEmpty" to FALSE and as soon as you populate the "activityLog" collection set it to true.

Upvotes: 2

Related Questions