Can
Can

Reputation: 369

solr gives same score for different values

I have field type defined in my schema.xml such as follows;

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        </analyzer>
    </fieldType>

Here is my field;

<dynamicField name="*_text" type="text" indexed="true" stored="true" />

When I query the value "am26" in solr admin query board, I am having following results in my document. (There are many fields exist however, I just select code_text and score fields to show)

"response": {
"numFound": 6,
"start": 0,
"maxScore": 1184.7297,
"docs": [
  {
    "code_text": "AM232",
    "score": 1184.7297
  },
  {
    "code_text": "AM238",
    "score": 1184.7297
  },
  {
    "code_text": "AM266",
    "score": 1184.7297
  },
  {
    "code_text": "AM268",
    "score": 1184.7297
  },
  {
    "code_text": "AM269",
    "score": 1184.7297
  },
  {
    "code_text": "AM273",
    "score": 1184.7297
  },
]

How come score of AM232 and AM266 could be the same ? Furthermore, how come we can see values like AM232 and AM273 among the results ? Far as I aware when we query "am26" solr first converts this string into lower case (according to definition in schema.xml) and WordDelimiterFilterFactory splits string as am,26. So I can understand the results which includes 26 and AM but I don't know why I see "AM232" and "AM273" in my document. Plus they have exact score.

Upvotes: 0

Views: 654

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

As you said, your search terms will be: "am" and "26"

However, there is no wildcard involved in this search. All of the results given match the "am" part, but none of them match "26". For "AM266", the indexed terms are "am" and "266". But the term "26" is still not a match for "266". I expect if you had a document "AM26", you would indeed see that get a higher score that the others.

Upvotes: 2

Related Questions