user595014
user595014

Reputation: 124

eDisMax parser issue running over multiple fields

Enviornment ==> solr - solr-8.9.0, java version "11.0.12" 2021-07-20 LTS

Following .csv file is indexed in solr

books_id,cat,name,price,inStock,author,series_t,sequence_i,genre_s
0553573403,book,Game Thrones Clash,7.99,true,George R.R. Martin,"A Song of Ice and Fire",1,fantasy
0553573404,book,Game Thrones,7.99,true,George Martin,"A Song of Ice and Fire",1,fantasy
0553573405,book,Game Thrones,7.99,true,George,"A Song of Ice and Fire",1,fantasy

I want to search for a book with having a name saying 'Game Thrones Clash'(with mm=75%) and author George R.R. Martin(with mm=70%.)

Now I want to search book-name in only the 'name' field having its minimum match value as well. Also, the author needs to be searched in author, with different mm values.

field-type : text_general is configured for fields :'name','author' with multivalued as false.

Query shall run over input field 'name'(mm=75%) having the value 'Game Thrones Clash' and author(mm=70%) having the value 'George R.R. Martin'.

There are 3 criteria over which results will be displayed, Only those results shall be displayed which satisfy all the following three criteria:

  1. if there is a minimum of 75% of tokens are fuzzy matches in the 'name' field, then it should result in output.
  2. if there is a minimum of 70% of tokens are fuzzy matches in the 'author' field, then it should result in output.
  3. if field 'inStock' has value 'true'.

Output shall contain the following result.

0553573403 (name - 75% matched as well author 70% matched)
0553573404 (name - 75% matched as well author 70% matched)

Following books_id will not contain in output.

0553573405 (name - 75% matched but author not 70% matched)

I understand that Extended DisMax includes query parameters 'mm'(Minimum should match) with fuzzy search functionality, but the following query is giving all 3 results.

curl -G http://$solrIp:8983/solr/testCore2/select --data-urlencode "q=(name:'Game~' OR name:'Thrones~' OR name:'Clash~')" --data-urlencode "defType=edismax" --data-urlencode "mm=75%" --data-urlencode "q=(author:'George~' OR author:'R.R.~' OR author:'Martin~')" --data-urlencode "defType=edismax" --data-urlencode "mm=70%" --data-urlencode "sort=books_id asc"
{
  "responseHeader":{
    "status":0,
    "QTime":3,
    "params":{
      "mm":["75%",
        "70%"],
      "q":["(name:'Game~' OR name:'Thrones~' OR name:'Clash~')",
        "(author:'George~' AND author:'R.R.~' AND author:'Martin~')"],
      "defType":["edismax",
        "edismax"],
      "sort":"books_id asc"}},
  "response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
      {
        "books_id":[553573403],
        "cat":["book"],
        "name":"Game Thrones Clash",
        "price":[7.99],
        "inStock":[true],
        "author":"George R.R. Martin",
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "id":"3de00ecb-fbaf-479b-bfde-6af7dd63c60f",
        "_version_":1738326424041816064},
      {
        "books_id":[553573404],
        "cat":["book"],
        "name":"Game Thrones",
        "price":[7.99],
        "inStock":[true],
        "author":"George Martin",
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "id":"a036a400-4f54-4c90-a52e-888349ecb1da",
        "_version_":1738326424107876352},
      {
        "books_id":[553573405],
        "cat":["book"],
        "name":"Game Thrones",
        "price":[7.99],
        "inStock":[true],
        "author":"George",
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "id":"36360825-1164-4cb6-bf48-ebeaaff0ef10",
        "_version_":1738326424111022080}]
  }}

Can someone help me in writing edismax query or any other way around?

Upvotes: 2

Views: 495

Answers (1)

user9712582
user9712582

Reputation: 1673

Nested Queries can specify different query parameters for different parts of the query.

+_query_:"{!edismax mm=75% df=name} Game~ Thrones~ Clash~"
+_query_:"{!edismax mm=70% df=author} George~ R.R.~ Martin~"
+inStock:true
  • Nested Queries are specified by using _query_: as the prefix (see references [1], [2]).
  • Local Params [3] {! ... }... specify the query parser and its parameters by prefixing the query string.
    • query type [4] eDisMax [5]: {!type=edismax}... or just {!edismax}...
    • minimum [should-term] match (mm) [6]: {!mm=75%}...
    • default field (df) [7]: {!df=author}...
  • Requiring both of two subqueries can be specified via either of the following:
    • +query1 +query2
    • query1 AND query2
  • Line breaks are for clarity, all lines are within one query parameter (q=).
    • (If multiple q= parameters exist, adding a debugQuery=true parameter shows that only one is parsed.)

Testing

Steps to reproduce test with a local Solr 9, in Cloud Mode for the Schema Designer page:

  • Command line: solr-9.0.0/bin/solr start -e cloud
  • answer questions until return to command prompt
  • In web browser, open Solr Admin: http://localhost:8983/solr/
  • On the Schema Designer tab,
  • click New Schema
  • Name New Schema: eDisMaxTest1
  • Copy From: _default
  • click Add Schema
  • Under Sample Documents, past csv data and clicked Analyze Documents.
  • Under Schema Editor, select fields author and name and
  • changed type to text_general.
  • disabled the [ ] Multi-Valued checkbox.
  • clicked Update Schema
  • To find URL: from Schema Designer page, open browser F12 debugger,
  • enable the Network tab,
  • perform a query: from the Query Tester, click RunQuery
  • copy the path http://localhost:8983/api/schema-designer/query
  • copy the parameters _=1658489222229 and configSet=eDisMaxTest1 (values may differ).

Request: one query string

bash: (\ before newline continues line, omit inside '...')

curl --silent --get localhost:8983/api/schema-designer/query \
  --data-urlencode "_=1658489222229" \
  --data-urlencode "configSet=eDisMaxTest1" \
  --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name} Game~ Thrones~ Clash~" 
                       +_query_:"{!edismax mm=70% df=author} George~ R.R.~ Martin~"
                       +inStock:true' \
  --data-urlencode "sort=books_id asc"

cmd: (^ before newline continues line, even inside '...')

curl --silent --get localhost:8983/api/schema-designer/query ^
  --data-urlencode "_=1658489222229" ^
  --data-urlencode "configSet=eDisMaxTest1" ^
  --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name} Game~ Thrones~ Clash~" ^
                       +_query_:"{!edismax mm=70% df=author} George~ R.R.~ Martin~" ^
                       +inStock:true' ^
  --data-urlencode "sort=books_id asc"

Alternative request: use nested query parameter v=$param refer to other request parameters containing subquery terms.

bash: (\ before newline continues line, omit inside '...')

curl --silent --get localhost:8983/api/schema-designer/query \
  --data-urlencode "_=1658489222229" \
  --data-urlencode "configSet=eDisMaxTest1" \
  --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name v=$qname}"
                       +_query_:"{!edismax mm=70% df=author v=$qauthor}"
                       +inStock:true' \
  --data-urlencode "qname= Game~ Thrones~ Clash~" \
  --data-urlencode "qauthor= George~ R.R.~ Martin~" \
  --data-urlencode "sort=books_id asc"

cmd: (^ before newline continues line, even inside '...')

curl --silent --get localhost:8983/api/schema-designer/query ^
  --data-urlencode "_=1658489222229" ^
  --data-urlencode "configSet=eDisMaxTest1" ^
  --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name v=$qname}" ^
                       +_query_:"{!edismax mm=70% df=author v=$qauthor}" ^
                       +inStock:true' ^
  --data-urlencode "qname= Game~ Thrones~ Clash~" ^
  --data-urlencode "qauthor= George~ R.R.~ Martin~" ^
  --data-urlencode "sort=books_id asc"

Response: two books as desired

{
  ...
  "responseHeader":{
    ...
    "params":{
      "q":" +_query_:\"{!edismax mm=75% df=name v=$qname}\"\n
            +_query_:\"{!edismax mm=70% df=author v=$qauthor}\"\n
            +inStock:true",
      "qauthor":" George~ R.R.~ Martin~",
      "qname":" Game~ Thrones~ Clash~",
      "sort":"books_id asc",
      "configSet":"eDisMaxTest1",
      "wt":"javabin",
      "version":"2",
      "_":"1658489222229"}},
  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
      {
        "name":"Game Thrones Clash",
        "author":"George R.R. Martin",
        ...
        "books_id":553573403,
        "inStock":true},
      {
        "name":"Game Thrones",
        "author":"George Martin",
        ...
        "books_id":553573404,
        "inStock":true}]
  }}

References

[1] Nested Query Parser (Solr Reference Guide / Query Guide / Query Syntax and Parsers / Other Query Parsers) https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#nested-query-parser

[2] Nested Queries in Solr https://lucidworks.com/post/nested-queries-in-solr/

[3] Local Params (Solr Reference Guide / Query Guide / Query Syntax and Parsers) https://solr.apache.org/guide/solr/latest/query-guide/local-params.html

[4] Query type short form (Solr Reference Guide / Query Guide / Query Syntax and Parsers / Local Params) https://solr.apache.org/guide/solr/latest/query-guide/local-params.html#query-type-short-form

[5] Extended DisMax (eDisMax) Query Parser (Solr Reference Guide / Query Guide / Query Syntax and Parsers) https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html

[6] mm (Minimum [Should-term] Match) Parameter (Solr Reference Guide / Query Guide / Query Syntax and Parsers / DisMax Query Parser) https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html#mm-minimum-should-match-parameter

[7] df [default field] (Solr Reference Guide / Query Guide / Query Syntax and Parsers / Standard Query Parser / Standard Query Parser Parameters) https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html#standard-query-parser-parameters

Upvotes: 1

Related Questions