bryan
bryan

Reputation: 9389

NoSQL Structure for handling labeled tags

Currently I have a hundreds of thousands of files like so:

{
    "_id": "1234567890",
    "type": "file",
    "name": "Demo File",
    "file_type": "application/pdf",
    "size": "1400",
    "timestamp": "1491421149",
    "folder_id": "root"
}

Currently, I index all the names, and a client can search for files based on the name of the file. These files also have tags that need to be associated with the file but they also have specific labels.

An example would be:

{
    "tags": [
        { "client": "john doe" },
        { "office": "virginia" },
        { "ssn": "1234" }

    ]
}

Is adding the tags array to my above file object the ideal solution if I want to be able to search thousands of files with a client of John Doe?

The only other solution I can think of is having something an object per tag and having an array of file ID's associated with each tag like so:

{
    "_id": "11111111",
    "type": "tag",
    "label": "client",
    "items": [
        "1234567890",
        "1222222222",
        "1333333333"
    ]
}

With this being a LOT of objects I need to add tags to, I'd rather do it the most efficient way possible FIRST so I don't have to backtrack in the near future when I start running into issues.

Any guidance would be greatly appreciated.

Upvotes: 2

Views: 2315

Answers (2)

markwatsonatx
markwatsonatx

Reputation: 3491

Your original design, with a tags array, works well with Cloudant Search: https://console.ng.bluemix.net/docs/services/Cloudant/api/search.html#search.

With this approach you would define a single design document that will index any tag in the tags array. You do not have to create different views for different tags and you can use the Lucene syntax for queries: http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview.

So, using your example, if you have a document that looks like this with tags:

{
  "_id": "1234567890",
  "type": "file",
  "name": "Demo File",
  "file_type": "application/pdf",
  "size": "1400",
  "timestamp": "1491421149",
  "folder_id": "root",
  "tags": [
    { "client": "john doe" },
    { "office": "virginia" },
    { "ssn": "1234" }
  ]
}

You can create a design document that indexes each tag like so:

{
  "_id": "_design/searchFiles",
  "views": {},
  "language": "javascript",
  "indexes": {
    "byTag": {
      "analyzer": "standard",
      "index": "function (doc) {\n  if (doc.type === \"file\" && doc.tags) {\n    for (var i=0; i<doc.tags.length; i++) {\n      for (var name in doc.tags[i]) {\n        index(name, doc.tags[i][name]);\n      }\n    }\n  }\n}"
    }
  }
}

The function looks like this:

function (doc) {
  if (doc.type === "file" && doc.tags) {
    for (var i=0; i<doc.tags.length; i++) {
      for (var name in doc.tags[i]) {
        index(name, doc.tags[i][name]);
      }
    }
  }
}

Then you would search like this:

https://your_cloudant_account.cloudant.com/your_db/_design/searchFiles/_search/byTag
?q=client:jack+OR+office:virginia
&include_docs=true

Upvotes: 2

cbickel
cbickel

Reputation: 56

The solution, that comes into my mind would be using map reduce functions.

To do that, you would add the tags to your original document:

{
    "_id": "1234567890",
    "type": "file",
    "name": "Demo File",
    "file_type": "application/pdf",
    "size": "1400",
    "timestamp": "1491421149",
    "folder_id": "root",
    "client": "john",
    ...
}

Afterwards, you can create a design document, that looks like this:

{
    "_id": "_design/query",
    "views": {
        "byClient": {
            "map": "function(doc) { if(doc.client) { emit(doc.client, doc._id) }}"
        }
    }
}

After the view is processed, you can open it with

GET /YOURDB/_design/query/_view/byClient?key="john"

By adding the query parameter include_docs=true, the whole document will be returned, instead of the id.

You can also write your tags into an tags attribute, but you have to update the map function to match the new design.

More information about views can be found here: http://docs.couchdb.org/en/2.0.0/api/ddoc/views.html

Upvotes: 0

Related Questions