Kumar Vikramjeet
Kumar Vikramjeet

Reputation: 263

Handling string and string array pair effectively

I'm saving TBs worth of data in ES in the following manner:

"class" : 
{
 "type": "nested",
 "properties": {
   "name": {"type": "string"},
   "methods": [ {
     "name": {"type": "string"}
   } ]
}        

Simply put, I'm saving data as (class1, [method1, method2,...]), (class2, [method3, method4,...]) ...

I saw in the ES docs, that all data is reduced in lucene key-value pair, not sure if that is relevant here.

Would it decrease the search latency, if I arrange the data as follows: {class1,method1}, {class1,method2},.... {class2, method3}....

Sample query: Search for given class name and method name pair, and show all docs having that pair in the index.

Appreciate any help. Please suggest, if there is a better way to handle it.

Upvotes: 0

Views: 48

Answers (1)

BrookeB
BrookeB

Reputation: 1769

Between your two options (i.e. one nested doc per class vs. one nested doc per class and method pair), there should not be a noticeable difference in search times. Personally, I would prefer the first option, since it seems a better model of your data. Plus, it means fewer documents in total. (Keeping in mind, that a "nested" doc in ES is really just another true document in Lucene, under the hood. ES simply manages keeping the nested docs located directly next to your parent doc for efficient relationship management)

Internally, ES treats every value as an array, so it is certainly suited to handle the first option. Assuming an example mapping like this:

PUT /my_index/
{
  "mappings": {
    "my_type": {
      "properties": {
        "someField": { "type": "string" },
        "classes": {
          "type": "nested", 
          "properties": {
            "class": { "type":"string", "index":"not_analyzed" },
            "method": { "type": "string", "index":"not_analyzed" }
          }
        }
      }
    }
  }
}

You can then input your documents, such as:

POST test_index/my_type
{
  "someField":"A",
  "classes": {
    "class":"Java.lang.class1",
    "method":["myMethod1","myMethod2"]
  }
}

POST test_index/my_type
{
  "someField":"B",
  "classes": {
    "class":"Java.lang.class2",
    "method":["myMethod3","myMethod4"]
  }
}

In order to satisfy your sample query, you can simply use a bool filter inside a nested query. For example:

GET test_index/my_type/_search
{
  "query": {
    "nested": {
      "path": "classes",
      "query": {
        "bool": {
          "filter": [
            { "term": {"classes.class":"Java.lang.class2"} },
            { "term": {"classes.method":"myMethod3"} }
          ]
        }
      }
    }
  }
}

This would return the second document from my example.

Upvotes: 1

Related Questions