Reputation: 99
I'm new on couchbase and I'm doing some queries using N1QL but it takes a lot of time (9 minutes) My data have 200.000 documents and the documents have nested types, the number of nested type inside the documents is 6.000.000 distributed between the 200.000 documents, so the UNNEST operation is important. One sample of my data is:
{"p_partkey": 2, "lineorder": [{"customer": [{"c_city": "INDONESIA1"}], "lo_supplycost": 54120, "orderdate": [{"d_weeknuminyear": 19}], "supplier": [{"s_phone": "16-789-973-6601|"}], "commitdate": [{"d_year": 1993}], "lo_tax": 7}, {"customer": [{...
One query I'm doing is:
SELECT SUM(l.lo_extendedprice*l.lo_discount*0.01) as revenue
from part p UNNEST p.lineorder l UNNEST l.orderdate o
where o.d_year=1993 and l.lo_discount between 1 and 3 and l.lo_quantity<25;
The data has the fields mentioned above. But it takes 9 minutes to execute. I'm using only my computer to do it, so just one node. My computer have 16GB fo RAM and the cluster RAM cota is 3,2GB with just one bucket with 3GB. My data has 2,45GB of total size. I have used the calculation mentioned here: http://docs.couchbase.com/admin/admin/Concepts/bp-sizingGuidelines.html to size my cluster and bucket. I'm doing something wrong or this time is correct for this amount of data?
For now I have created the index like:
CREATE INDEX idx_discount ON part( DISTINCT ARRAY l.lo_discount FOR l IN lineorder END );
CREATE INDEX idx_quantity ON part( DISTINCT ARRAY l.lo_quantity FOR l IN lineorder END );
CREATE INDEX idx_year ON part( DISTINCT ARRAY o.d_year FOR o IN ( DISTINCT ARRAY l.orderdate FOR l IN lineorder END ) END );
But the database dont use it.
One query example is:
SELECT SUM(l.lo_extendedprice*l.lo_discount*0.01) as revenue
from part p UNNEST p.lineorder l UNNEST l.orderdate o
where o.d_year=1993 and l.lo_discount between 1 and 3 and l.lo_quantity<25;
Another example, I have created the index:
CREATE INDEX teste3 ON `part` (DISTINCT ARRAY l.lo_quantity FOR l IN lineorder END );
and queried:
select l.lo_quantity from part as p UNNEST p.lineorder l where l.lo_quantity>20 limit 3
Because I have deleted the primary index, it dont execute. Returning the error: "No primary index on keyspace part. Use CREATE PRIMARY INDEX to create one.",
Upvotes: 1
Views: 863
Reputation: 99
After reading the blog on:http://blog.couchbase.com/2016/may/1.making-most-of-your-arrays..-with-covering-array-indexes-and-more I discovedered the problem:
If you create the INDEX like this:
CREATE INDEX iflight_day
ON `travel-sample` ( DISTINCT ARRAY v.flight FOR v IN schedule END );
You have to use the same letters on the queries, in this case the letter 'v'.
SELECT v.day from `travel-sample` as t UNNEST t.schedule v where v.flight="LY104";
The same is the case for deepest levels:
CREATE INDEX inested ON `travel-sample`
( DISTINCT ARRAY (DISTINCT ARRAY y.flight FOR y IN x.special_flights END) FOR x IN schedule END);
In this case you have to use 'y' and 'x':
SELECT x.day from `travel-sample` as t UNNEST t.schedule x UNNEST x.special_flights y where y.flight="AI444";
And now every thing work fine.
But another problem arises when I queried like this:
SELECT * from `travel-sample` as t UNNEST t.schedule x UNNEST x.special_flights y
where x.day=7 and y.flight="AI444";
Only the day index created like above is used.
CREATE INDEX day
ON `travel-sample` ( DISTINCT ARRAY y.day FOR y IN schedule END );
It's used only one index, sometimes 'day', sometimes 'inested'.
Upvotes: 2
Reputation: 2445
You can use Couchbase 4.5 (GA upcoming) with array indexing. Array indexing can be used with UNNEST. It allows you to index individual elements of arrays, including arrays nested within other arrays.
You can create the following indexes, and then use EXPLAIN to make sure there is an IndexScan using your intended index.
CREATE INDEX idx_discount ON part( DISTINCT ARRAY l.lo_discount FOR l IN lineorder END );
CREATE INDEX idx_quantity ON part( DISTINCT ARRAY l.lo_quantity FOR l IN lineorder END );
CREATE INDEX idx_year ON part( DISTINCT ARRAY ( DISTINCT ARRAY o.d_year FOR o IN l.orderdate END ) FOR l IN lineorder END );
Upvotes: 1