Akshay
Akshay

Reputation: 524

Solr returning wrong documents while searching field containing dots in solr.StrField

  1. Field Type:

    fieldType name="StrCollectionField" class="solr.StrField" omitNorms="true" multiValued="true" docValues="true"

    field name="po_line_status_code" type="StrCollectionField" indexed="true" stored="true" required="false" docValues="false"

    po_no is PK

  2. Index value: po_line_status_code:[3700.100]

  3. Search Query: po_line_status_code:(1100.200 1100.500 1100.600 1100.400 1100.300 1100.750 1100.450)

Result: Getting Results with po_line_status_code: [3700.100] as well.

Does Solr internally tokenize solr.StrField containing dots or is some regular expression matching going on here? Sounds like a bug to me.

We don't get this document, when we change the query to one of the following 1> po_line_status_code:(1200.200 1200.500 1200.600 1200.400 1200.300 1200.750 1200.450) 2> po_line_status_code:(1100.200 1100.500 1100.600 1100.400 1100.300 1100.750 1100.450) AND po_no:938792842

We are using DSE version: 4.7.4 having Apache Solr 4.10.3.0.203.

Debug Query Output from one the servers which is returning wrong documents: response={numFound=2,start=0,docs=[SolrDocument{po_no=4575419580094, po_line_status_code=[3700.4031]}, SolrDocument{po_no=1575479951283, po_line_status_code=[3700.100]}]},debug={rawquerystring=po_line_status_code:(3 1100.200 29 5 6 1100.300 63 199 1100.500 200 1100.600 198 1100.400 343 344 345 346 347 409 410 428 1100.750 1100.450) ,querystring=po_line_status_code:(3 1100.200 29 5 6 1100.300 63 199 1100.500 200 1100.600 198 1100.400 343 344 345 346 347 409 410 428 1100.750 1100.450)]

I also see the below thing in the response which I believe has something do with ranking or so:

No match on required clause (po_line_status_code:3 po_line_status_code:1100.200 po_line_status_code:29 po_line_status_code:5 po_line_status_code:6 po_line_status_code:1100.300 po_line_status_code:63 po_line_status_code:199 po_line_status_code:1100.500 po_line_status_code:200 po_line_status_code:1100.600 po_line_status_code:198 po_line_status_code:1100.400 po_line_status_code:343 po_line_status_code:344 po_line_status_code:345 po_line_status_code:346 po_line_status_code:347 po_line_status_code:409 po_line_status_code:410 po_line_status_code:428 po_line_status_code:1100.750 po_line_status_code:1100.450)\n 0.0 = (NON-MATCH) product of:\n 0.0 = (NON-MATCH) sum of:\n 0.0 = coord(0/23)\n 0.015334824

Also, could it be something to do with re-indexing? If I re-index my documents will it fix the issue?

The links to doc file containing solr schema and solr config can be found here

Upvotes: 0

Views: 422

Answers (1)

David George
David George

Reputation: 3752

I've had to put this in an answer as the comments won't allow formatting.

No it's not a version problem or a tokenizer problem or a bug in solr.

solr.StrField won't tokenize on either analysis or query. It is matching on something else. Can you post solrconfig.xml and schema.xml?

If you are searching on po_line_status_code this is the debug you should see:

"querystring": " po_line_status_code:(1100.200 1100.500 1100.600 1100.400 1100.300 1100.750 1100.450)",
    "parsedquery": "(+(po_line_status_code:1100.200 po_line_status_code:1100.500 po_line_status_code:1100.600 po_line_status_code:1100.400 po_line_status_code:1100.300 po_line_status_code:1100.750 po_line_status_code:1100.450))/",

Whereas what you are seeing is

querystring=ship_node:610055 AND po_line_status_code:(3 1100.200 29 5 6 1100.300 63 199 1100.500 200 1100.600 198 1100.400 343 344 345 346 347 409 410 428 1100.750 1100.450) AND expected_ship_date:[2016-02-03T16:00:00.000Z TO 2016-06-09T13:59:59.059Z]

So your query string has been altered. I assume all your queries are through the solr admin tool? So that should leave DSE out of the loop.

I still wouldn't expect your query to match but things are more complicated than you have presented them as you have ship_node and expected_ship_date in your query too.

Oh the No match on required clause says that you didn't match anything with the po_line_status_code query.

Upvotes: 0

Related Questions