Janusgraph is doing full table scans for equality queries. Not using indexed backend to get better performance

Question

I'm running janusgraph server backed by AWS Keyspace and Elasticsearch. The elasticsearch backend is properly configured and the dataload process is able to persist data in elasticsearch as expected.

Janugraph is doing full scans for equality based queries. It is not making use of indexes.

Example:

gremlin> g.E().has("edge_id","axxxxxxxx6a1796de717e9df").profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[edge_id.eq(axxxxxxxx6a1796de...                                          1227.690   100.00
  constructGraphCentricQuery                                                                   0.087
  constructGraphCentricQuery                                                                   0.003
  GraphCentricQuery                                                                         1227.421
    \_condition=(edge_id = axxxxxxxx6a1796de717e9df)
    \_orders=[]
    \_isFitted=false
    \_isOrdered=true
    \_query=[]
    scan                                                                                    1227.316
    \_query=[]
    \_fullscan=true
    \_condition=EDGE
                                            >TOTAL                     -           -        1227.690        -

When I use textContains it does make use of the indices.

g.E().has("edge_id",textContains("axxxxxxxx6a1796de717e9df")).bothV().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[edge_id.textContains(axxxx.....                     2           2        1934.487   100.00
  constructGraphCentricQuery                                                                   0.125
  GraphCentricQuery                                                                         1934.234
    \_condition=(edge_id textContains axxxxxxxx6a1796de717e9df)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=[(edge_id textContains axxxxxxxx6a1796de717e9df)]:edge_information
    \_index=edge_information
    \_index_impl=search
    backend-query                                                      2                    1934.207
    \_query=edge_information:[(edge_id textContains axxxxxxxx6a1796de717e9df)]:edge_information
EdgeVertexStep(BOTH)                                                   4           4           0.043     0.00
                                            >TOTAL                     -           -        1934.530        -

Is there a configuration which controls this behavior? In my opinion doing full table scans are very in-efficient.

When I run janusgraph locally I do see it makes use of index backend even for equality queries.

Boxuan Li · Accepted Answer

Check out https://docs.janusgraph.org/index-backend/text-search/#full-text-search. By default, mixed indexes only support full-text search while you want equality matches. You need to use String search or Full text + String search.

Janusgraph is doing full table scans for equality queries. Not using indexed backend to get better performance

Answers (1)

Related Questions