Vespa - Proton: Custom bucketing & Query

Question

References:

id scheme

Format: id::::

http://docs.vespa.ai/documentation/content/buckets.html
http://docs.vespa.ai/documentation/content/idealstate.html

its possible to structure data in user defined bucketing logic by using 32 LSB in document-id format (n / g selections).

however, the query logic isn't very clear on how to route queries to a specific bucket range based on a decision taken in advance.

e.g., it is possible to split data into a time range (start-time/end-time) if i can define n (a number) compressing the range. all documents tagged such will end up in same bucket (that will follow its course of split on number of documents / size as configured).

however, how do i write a search query on data indexed in such manner? is it possible to indicate the processor to choose a specific bucket, or range of buckets (in case distribution algorithm might have moved buckets)?

Jon · Accepted Answer

You can choose one bucket in a query by specifying the streaming.groupname query property.

Either in the http request by adding

&streaming.groupname=[group]

or in a Searcher by

query.properties().set("streaming.groupname","[group]").

If you want multiple buckets, use the parameter streaming.selection instead, which accepts any document selection expression: http://docs.vespa.ai/documentation/reference/document-select-language.html

To specify e.g two buckets, use set streaming.selection (in the HTTP request or a Searcher) to

id.group=="[group1]" and id.group=="[group2]"

See http://docs.vespa.ai/documentation/streaming-search.html

Note that streaming search should only be used when each query only need to search one or a few buckets. It avoids building reverse indexes, which is cheaper in that special case (only).

Vespa - Proton: Custom bucketing & Query

Answers (2)

Related Questions

Vespa - Proton: Custom bucketing &amp; Query

Answers (2)

Related Questions

Vespa - Proton: Custom bucketing & Query