user1151
user1151

Reputation:

How to query a multi index in RethinkDB over an array of objects

I'm working with a data set that looks something like this:

"bitrates": [
  {
    "format":  "mp3" ,
    "rate":  "128K"
  } ,
  {
    "format":  "aac" ,
    "rate":  "192K"
  }
] ,
"details": [ ... ] ,
"id": 1 ,
"name":  "For Those About To Rock We Salute You" ,
"price": 1026 ,
"requires_shipping": false ,
"sku":  "ALBUM-1" 
}

And I wanted to create a secondary index on bitrates, flexing {multi:true}. This was my attempt:

r.db("music").table("catalog").indexCreate("bitrates", {multi: true})

The index built just fine, but when I query it, nothing returns - which seems contrary to every example I've read here:

http://rethinkdb.com/docs/secondary-indexes/javascript/

The query I wrote is this:

r.db("music").table("catalog").getAll(["mp3", "128K"], {index : "bitrates"})

There is no error, just 0 results (and I have 300 or so documents with this exact data).

I'm using RethinkDB 2.0 RC1.

Upvotes: 8

Views: 2853

Answers (2)

mlucy
mlucy

Reputation: 5289

Keys of secondary indexes can't be objects right now:

> r.table('foo').indexCreate('bitrates', {multi: true})
> r.table('foo').getAll({format: "mp3", rate: "128K"}, {index: 'bitrates'})
RqlRuntimeError: Secondary keys must be a number, string, bool, pseudotype, or array

You can track this issue at https://github.com/rethinkdb/rethinkdb/issues/2773 .

For a work-around, you can do this:

> r.table('foo').indexCreate('bitrates', function(row){
    return row('bitrates').map(function(bitrate){return bitrate.coerceTo('array');})
  }, {multi: true});
> r.table('foo').getAll(r.expr({format: "mp3", rate: "128K"}).coerceTo('array'), {index: 'bitrates'})

Upvotes: 1

Nate Kohari
Nate Kohari

Reputation: 2224

When you create an index for a column, the values in the column are used literally as the keys of the index. In your case, the keys for your bitrates index would be the objects within the bitrates array in the document.

It seems like what you want is an index that's derived from the values in a field of the document. To do that, you want to define a custom indexing function that reduces the document to just the data you care about. The easiest way to experiment with them is to start by writing a query, and once you're happy with the results, converting it into an indexCreate() statement.

Here's a statement that grabs your sample document (with id 1), and plucks the format and rate terms from all of the objects in its bitrate array, and then merges them together to create a distinct set of strings:

r.db('music').table('catalog').get(1).do(function(row) {
  return row('bitrates').map(function(bitrate) {
    return [bitrate('format'), bitrate('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  })
})

Running this statement will return the following:

["mp3", "128K", "aac", "192K"]

This looks good, so we can use our function to create an index. In this case, since we're expecting the indexing function to return a set of items, we also want to specify {multi: true} to ensure we can query by the items in the set, not the set itself:

r.db('music').table('catalog').indexCreate('bitrates', function(row) {
  return row('bitrates').map(function(bitrate) {
    return [bitrate('format'), bitrate('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  })
}, {multi: true})

Once created, you can query your index like this:

r.db('music').table('catalog').getAll('mp3', {index: 'bitrates'})

You can also supply multiple query terms, to match rows that match any of the items:

r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'})

However, if a single document matches more than one term in your query, it will be returned more than once. To fix this, add distinct():

r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'}).distinct()

If necessary, you might also consider using downcase() to normalize the casing of the terms used in the secondary index.

You could also skip all of the indexing business entirely and use a filter() query:

r.db('music').table('catalog').filter(function(row) {
  return row('bitrates').map(function(bitrates) {
    return [bitrates('format'), bitrates('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  }).contains('mp3');
})

That said, if you're almost always querying your table in the same manner, generating a secondary index using a custom function will result in dramatically better performance.

Upvotes: 13

Related Questions