bxfckrclooo
bxfckrclooo

Reputation: 598

Scaling a field value within the result set

I am using the scale function in order to normalize the values of a field between 1 and 3. The problem is that the values are not normalized relative to the result set, but to all the indexed documents.

For example:

/select?q=id:173540413&fl=id,scale(id, 1, 2) 
id,"scale(id, 1, 2)"
173540413,1.9903924

/select?q=id:(173540413 173540377)&fl=id,scale(id, 1, 2) 
id,"scale(id, 1, 2)"
173540413,1.9903924
173540377,1.9903922

The desired result would be:

/select?q=id:173540413&fl=id,scale(id, 1, 2) 
id,"scale(id, 1, 2)"
173540413,1

/select?q=id:(173540413 173540377)&fl=id,scale(id, 1, 2) 
id,"scale(id, 1, 2)"
173540413,2
173540377,1

Is there some other way to scale the results, perhaps without using scale?

Upvotes: 0

Views: 363

Answers (2)

Rangesh Ananthalwar
Rangesh Ananthalwar

Reputation: 178

There is a roundabout way to achieve this in Solr.

Solr's scaling function behaves differently when used against a field that is query dependent vs when used against a field that is not.

So when you are trying to scale a field like 'id' that is not query dependent, Solr considers the entire document set to scale. This will alter your min and max to be taken from the entire doc-set instead of the query result set. Whereas, when you scale a query dependent value like query($q) (which is the TF-IDF text similarity score for a document against the searchterm), Solr considers only the search result set to get the min and max for scaling.

Now, what we want is the second option. So, we do something like this:

q=searchterm&fl=id,scale(sub(sum(id,query($q)),query($q)), 1, 2)

This is what we're doing with the field: id + query($q) - query($q)

This tricks Solr into thinking that this is a query dependent field, when in fact it evaluates to the same 'id' field. This will give you a smooth [1,2] range based scaling as intended.

Apart from scale(), I believe this approach will also work with other function queries like max() and min() which operate on entire document set instead of just the query result set.

Upvotes: 4

MatsLindh
MatsLindh

Reputation: 52892

You could use the Stats Component to get the min/max values for your set, then do the scaling yourself in your middleware.

The element below stats / stats_fields / fieldname should have a min and max key that indicates the extremities of values in your query result.

Upvotes: 0

Related Questions