Lars Gendner
Lars Gendner

Reputation: 1982

How to perform OR text search on nested documents in Solr

I have indexed a nested document structure in Solr 8.5.1 like this:

"docs": [
    {
        "id": "unmatching_parent_and_children",
        "searchtext": "bla bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_and_children.child_1",
                "searchtext": "bla bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "unmatching_parent_and_children.child_2",
                "searchtext": "bla bla bla",
                "entity_type": "child_type_2"
            }
        ]
    },
    {
        "id": "matching_parent_unmatching_children",
        "searchtext": "bla searchterm bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "matching_parent_unmatching_children.child_1",
                "searchtext": "bla bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "matching_parent_unmatching_children.child_2",
                "searchtext": "bla bla bla",
                "entity_type": "child_type_2"
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_1",
        "searchtext": "bla bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_1.child_1",
                "searchtext": "bla searchterm bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "unmatching_parent_matching_child_1.child_2",
                "searchtext": "bla bla bla",
                "entity_type": "child_type_2"
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_2",
        "searchtext": "bla bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_2.child_1",
                "searchtext": "bla bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "unmatching_parent_matching_child_2.child_2",
                "searchtext": "bla bla searchterm bla",
                "entity_type": "child_type_2"
            }
        ]
    }
]

I am looking for a query that performs a text search on searchtext in all parent and child documents, and that matches parents with a matching searchtext OR parents with children with a matching searchtext, or parents and children both with matching searchtext.

Something like this (this is pseudo code):

q=(entity_type:parent AND searchtext:searchterm) 
    OR ({!parent which="entity_type:parent"}(-entity_type:parent AND +searchtext:searchterm))
fl=id,[child parentFilter="entity_type:parent"]

Expected result:

"docs": [
    {
        "id": "matching_parent_unmatching_children",
        "_childDocuments_": [
            {
                "id": "matching_parent_unmatching_children.child_1",
            },
            {
                "id": "matching_parent_unmatching_children.child_2",
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_1",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_1.child_1",
            },
            {
                "id": "unmatching_parent_matching_child_1.child_2",
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_2",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_2.child_1",
            },
            {
                "id": "unmatching_parent_matching_child_2.child_2",
            }
        ]
    }
]

So far, I had no success constructing a Solr query that meets this requirement. Either the query generates parsing errors, or it is interpreted as a plain search text without honoring the expressions within, or only document structures are matched in which both parent AND children have a matching searchtext. The Query Parsers I experimented with (in several combinations) are Lucene, eDisMax/DisMax, Block Join Parent and Simple.

Upvotes: 0

Views: 526

Answers (2)

Hammad Shabbir
Hammad Shabbir

Reputation: 741

You can use Parent block join query parser like this. I have used your data and schmea and it is working fine as expected

q=(searchtext:searchterm AND entity_type:parent) OR _query_: "{!parent which=entity_type:parent}+searchtext:searchterm"

fl=id,[child parentFilter="entity_type:parent"]

Upvotes: 0

Felix
Felix

Reputation: 56

I tried your query with my own dataset and got some parsing erros too:

"Parent query must not match any docs besides parent filter. Combine them as must (+) and must-not (-) clauses to find a problem doc. docID=0"

I don't understand why this parsing error occurs.

But it works for me when I moved the sub-query -entity_type:parent AND +searchtext:searchterm into the filters parameter of the {!parent} query:

q=(entity_type:parent AND searchtext:searchterm) OR ({!parent which="entity_type:parent" filters="-entity_type:parent AND +searchtext:searchterm"})

fl=id,[child parentFilter="entity_type:parent"]

It should return the same result. See also https://lucene.apache.org/solr/guide/8_5/other-parsers.html#filtering-and-tagging-2

I hope this will help you too.

Upvotes: 0

Related Questions