Reputation: 1712
(Couchbase 4.5) Suppose I have the following object stored in my couchbase instance:
{
parentArray : [
{
childArray: [{value: 'v1'}, {value:'v2'}]
},
{
childArray: [{value: 'v1'}, {value: 'v3'}]
}
]
}
Now I want to select the distinct elements from childArray
, which should return an array equal to ['v1', 'v2', 'v3'].
I have a couple solutions to this. My first thought was to go ahead and use the UNNEST
operation:
SELECT DISTINCT ca.value FROM `my-bucket` AS b UNNEST b.parentArray AS pa UNNEST pa.childArray AS ca WHERE _class="someclass" AND dataType="someDataType";
With this approach I get a polynomial explosion in the number of scanned elements (due to the unnest'ing of two arrays), and the query takes a bit of time to complete (for my real data on the order of 24 seconds). When I remove unnest, and simply query for distinct elements on the top-level elements (those adjacent to parentArray), it takes on the order of milliseconds.
Another solution is to handle this in the application code, by simply iterating through the returned values and finding the distinct values my-self. This approach is bad, because it brings too much data into the application space.
Any help please!
Thank you!
UPDATE: Looks like without a "WHERE" clause using the "UNNEST" statements the performance is fast. So do I need Array Indexes here?
UPDATE: Nevermind about the previous update, since there is no index elements in the where clause. Also, but I do notice that if I remove the UNNEST OR the WHERE then the query is fast. Moreover, looking at the explain and adding a GSI for compound index (_class, dataType) I can see "IndexScan" on the provided index.
Upvotes: 0
Views: 1826
Reputation: 7414
INSERT INTO default values("3",{ "parentArray" : [ { "childArray": [{"value": 'v1'}, {"value":'v2'}] }, { "childArray": [{"value": 'v1'}, {"value": 'v3'}] } ] });
SELECT ARRAY_DISTINCT(ARRAY v.`value` FOR v WITHIN parentArray END) FROM default;
OR
SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(
ARRAY ARRAY v.`value` FOR v IN ca.childArray END FOR ca IN parentArray END,
2)) FROM default;
You can add where clause. If this requires across the documents use the following.
INSERT INTO default values("4",{ "parentArray" : [ { "childArray": [{"value": 'v5'}, {"value":'v2'}] }, { "childArray": [{"value": 'v1'}, {"value": 'v3'}] } ] });
SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(ARRAY v.`value` FOR v WITHIN parentArray END),2)) FROM default;
SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(ARRAY_FLATTEN(ARRAY ARRAY v.`value` FOR v IN ca.childArray END FOR ca IN parentArray END,2)),2)) FROM default;
Upvotes: 3