Aneesh Mon N
Aneesh Mon N

Reputation: 696

Solr Join parser performance issue

Solr Version: 6.3.0
Cloud: Yes
Shards: Single(1)
Data Size: 50GB
Records: 12M

We have a Solr Join query which tries to find the related ids from the given collection(Yes self join). This is causing a performance hit.

On analysis found that, Solr is scanning all the terms from the from_field irrespective of the q filter mentioned and then tries to do intersect with the to_field terms. Is there a way by which we can ask solr to filter the terms before doing intersect to the to_field in Join parser?

We have around 9M terms for the given solr field, which we assume to be cause for the the performance hit.

"join": {

    "{!join from=from_field to=to_field fromIndex=insight_pats_1_shard1_replica1}to_field: \u0001\u0000\u0000\u0000\u0000\u0000\u0003X\u0002H": {
        "time": 16824,
        "fromSetSize": 1,
        "toSetSize": 0,
        "fromTermCount": 8561723,
        "fromTermTotalDf": 8561723,
        "fromTermDirectCount": 8561505,
        "fromTermHits": 0,
        "fromTermHitsTotalDf": 0,
        "toTermHits": 0,
        "toTermHitsTotalDf": 0,
        "toTermDirectCount": 0,
        "smallSetsDeferred": 0,
        "toSetDocsAdded": 0
    }

},
"rawquerystring": "*:*",
"querystring": "*:*",
"parsedquery": "(+MatchAllDocsQuery(*:*))/no_coord",
"parsedquery_toString": "+*:*",
"explain": { },
"QParser": "ExtendedDismaxQParser",
"altquerystring": null,
"boost_queries": null,
"parsed_boost_queries": [ ],
"boostfuncs": null,
"filter_queries": [

    "account_ids:1",
    "{!join from=from_field to=to_field fromIndex=insight_pats_1}to_field:7733576"

],
"parsed_filter_queries": [

    "account_ids:1",
    "JoinQuery({!join from=from_field to=to_field fromIndex=insight_pats_1_shard1_replica1}to_field: \u0001\u0000\u0000\u0000\u0000\u0000\u0003X\u0002H)"

]

Upvotes: 1

Views: 1008

Answers (1)

Aneesh Mon N
Aneesh Mon N

Reputation: 696

There are two types of join parsers available

  • JoinQueryParser
  • ScoreJoinQParser

By default !join uses JoinQueryParser but is not optimal for joining records where size of Millions.

We can ask the SOLR to use ScoreJoinQParser by adding a parameter score=none in !join parser command as show below.

http://localhost:8983/solr/mycollection/select?fq={!join from=from_field to=to_field fromIndex=from_collection score=none}&indent=on&q=*:*&wt=json&debugQuery=on

We are able to achieve 30 times improvement in performance where the from_field terms are in the range of 8 Million

Upvotes: 1

Related Questions