shwetank
shwetank

Reputation: 115

Vespa - Proton: Custom bucketing & Query

References:

id scheme

Format: id:<namespace>:<document-type>:<key/value-pairs>:<user-specified>

http://docs.vespa.ai/documentation/content/buckets.html
http://docs.vespa.ai/documentation/content/idealstate.html

its possible to structure data in user defined bucketing logic by using 32 LSB in document-id format (n / g selections).

however, the query logic isn't very clear on how to route queries to a specific bucket range based on a decision taken in advance.

e.g., it is possible to split data into a time range (start-time/end-time) if i can define n (a number) compressing the range. all documents tagged such will end up in same bucket (that will follow its course of split on number of documents / size as configured).

however, how do i write a search query on data indexed in such manner? is it possible to indicate the processor to choose a specific bucket, or range of buckets (in case distribution algorithm might have moved buckets)?

Upvotes: 3

Views: 218

Answers (2)

Jon
Jon

Reputation: 2339

You can choose one bucket in a query by specifying the streaming.groupname query property.

Either in the http request by adding

&streaming.groupname=[group] 

or in a Searcher by

query.properties().set("streaming.groupname","[group]").

If you want multiple buckets, use the parameter streaming.selection instead, which accepts any document selection expression: http://docs.vespa.ai/documentation/reference/document-select-language.html

To specify e.g two buckets, use set streaming.selection (in the HTTP request or a Searcher) to

id.group=="[group1]" and id.group=="[group2]"

See http://docs.vespa.ai/documentation/streaming-search.html

Note that streaming search should only be used when each query only need to search one or a few buckets. It avoids building reverse indexes, which is cheaper in that special case (only).

Upvotes: 4

Jo Kristian Bergum
Jo Kristian Bergum

Reputation: 3184

The &streaming.* parameters is described here http://docs.vespa.ai/documentation/reference/search-api-reference.html#streaming.groupname

This only applies to document types which are configured with mode=streaming, for default mode which is index you cannot control the query routing http://docs.vespa.ai/documentation/reference/services-content.html#document

Upvotes: 0

Related Questions