Reputation: 9389
Currently I have a hundreds of thousands of files like so:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root"
}
Currently, I index all the names, and a client can search for files based on the name of the file. These files also have tags
that need to be associated with the file but they also have specific labels.
An example would be:
{
"tags": [
{ "client": "john doe" },
{ "office": "virginia" },
{ "ssn": "1234" }
]
}
Is adding the tags
array to my above file object the ideal solution if I want to be able to search thousands of files with a client of John Doe?
The only other solution I can think of is having something an object per tag and having an array of file ID's associated with each tag like so:
{
"_id": "11111111",
"type": "tag",
"label": "client",
"items": [
"1234567890",
"1222222222",
"1333333333"
]
}
With this being a LOT of objects I need to add tags to, I'd rather do it the most efficient way possible FIRST so I don't have to backtrack in the near future when I start running into issues.
Any guidance would be greatly appreciated.
Upvotes: 2
Views: 2315
Reputation: 3491
Your original design, with a tags array, works well with Cloudant Search: https://console.ng.bluemix.net/docs/services/Cloudant/api/search.html#search.
With this approach you would define a single design document that will index any tag in the tags array. You do not have to create different views for different tags and you can use the Lucene syntax for queries: http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview.
So, using your example, if you have a document that looks like this with tags:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root",
"tags": [
{ "client": "john doe" },
{ "office": "virginia" },
{ "ssn": "1234" }
]
}
You can create a design document that indexes each tag like so:
{
"_id": "_design/searchFiles",
"views": {},
"language": "javascript",
"indexes": {
"byTag": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.type === \"file\" && doc.tags) {\n for (var i=0; i<doc.tags.length; i++) {\n for (var name in doc.tags[i]) {\n index(name, doc.tags[i][name]);\n }\n }\n }\n}"
}
}
}
The function looks like this:
function (doc) {
if (doc.type === "file" && doc.tags) {
for (var i=0; i<doc.tags.length; i++) {
for (var name in doc.tags[i]) {
index(name, doc.tags[i][name]);
}
}
}
}
Then you would search like this:
https://your_cloudant_account.cloudant.com/your_db/_design/searchFiles/_search/byTag
?q=client:jack+OR+office:virginia
&include_docs=true
Upvotes: 2
Reputation: 56
The solution, that comes into my mind would be using map reduce functions.
To do that, you would add the tags to your original document:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root",
"client": "john",
...
}
Afterwards, you can create a design document, that looks like this:
{
"_id": "_design/query",
"views": {
"byClient": {
"map": "function(doc) { if(doc.client) { emit(doc.client, doc._id) }}"
}
}
}
After the view is processed, you can open it with
GET /YOURDB/_design/query/_view/byClient?key="john"
By adding the query parameter include_docs=true
, the whole document will be returned, instead of the id.
You can also write your tags into an tags attribute, but you have to update the map function to match the new design.
More information about views can be found here: http://docs.couchdb.org/en/2.0.0/api/ddoc/views.html
Upvotes: 0