Abhirath Mahipal
Abhirath Mahipal

Reputation: 956

ElasticSearch Accessing Nested Documents in Script - Null Pointer Exception

Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception

I have a mapping as such (simplified and obfuscated)

{
  "video_entry" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
       
        "captions_added" : {
          "type" : "boolean"
        },
        "category" : {
          "type" : "keyword"
        },
           
        "is_votable" : {
          "type" : "boolean"
        },
      
        "members" : {
          "type" : "nested",
          "properties" : {
            "country" : {
              "type" : "keyword",
            },
            "date_of_birth" : {
              "type" : "date",
            }
        }
   }
}

Each video_entry document can have 0 or more members nested documents.

Sample Document

{
   "captions_added": true,
   "category"      : "Mental Health",
   "is_votable:    : true,
   "members": [
        {"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
        {"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
   ]

}

If one or more nested document exist, we want to write some painless scripts that'd check certain fields across all the nested documents. My script works on mappings with a few documents but when I try it on larger set of documents I get null pointer exceptions despite having every null check possible. I've tried various access patterns, error checking mechanisms but I get exceptions.

POST /video_entry/_search
{
  "query": {
   "script": {
     "script": {
       "source": """
          // various NULL checks that I already tried
          // also tried short circuiting on finding null values
          if (!params['_source'].empty && params['_source'].containsKey('members')) {


              def total = 0;
          
          
              for (item in params._source.members) {
                // custom logic here
                // if above logic holds true 
                // total += 1; 
              } 
          
              return total > 3;
         }
         
         return true;
          
       """,
       "lang": "painless"
     }
   }
  }
}

Other Statements That I've Tried

if (params._source == null) {
    return true;
}

if (params._source.members == null) {
    return true;
}

if (!ctx._source.contains('members')) {
    return true;
}

if (!params['_source'].empty && params['_source'].containsKey('members') && 
     params['_source'].members.value != null) {
    
    // logic here

}

if (doc.containsKey('members')) {
  for (mem in params._source.members) {
  }

}

Error Message

&& params._source.members",
                 ^---- HERE"

 "caused_by" : {
            "type" : "null_pointer_exception",
            "reason" : null
          }

I've looked into changing the structure (flattening the document) and the usage of must_not as indicated in this answer. They don't suit our use case as we need to incorporate some more custom logic.

Different tutorials use ctx, doc and some use params. To add to the confusion Debug.explain(doc.members), Debug.explain(params._source.members) return empty responses and I'm having a hard time figuring out the types.


Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception

Any help is appreciated.

Upvotes: 4

Views: 1779

Answers (1)

Paulo
Paulo

Reputation: 10431

TLDr;

Elastic flatten objects. Such that

{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

Turn into:

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

To access members inner value you need to reference it using doc['members.<field>'] as members will not exist on its own.

Details

As you may know, Elastic handles inner documents in its own way. [doc]

So you will need to reference them accordingly.

Here is what I did to make it work. Btw, I have been using the Dev tools of kibana

PUT /so_test/

PUT /so_test/_mapping
{
  "properties" : {
    "captions_added" : {
      "type" : "boolean"
    },
    "category" : {
      "type" : "keyword"
    },
    "is_votable" : {
      "type" : "boolean"
    },
    "members" : {
      "properties" : {
        "country" : {
          "type" : "keyword"
        },
        "date_of_birth" : {
          "type" : "date"
        }
      }
    }
  }
}

POST /so_test/_doc/
{
   "captions_added": true,
   "category"      : "Mental Health",
   "is_votable"    : true,
   "members": [
        {"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
        {"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
   ]
}

PUT /so_test/_doc/
{
   "captions_added": true,
   "category"      : "Mental breakdown",
   "is_votable"    : true,
   "members": []
}

POST /so_test/_doc/
{
   "captions_added": true,
   "category"      : "Mental success",
   "is_votable"    : true,
   "members": [
        {"country": "France", "date_of_birth": "1998-04-04T00:00:00"},
        {"country": "Japan", "date_of_birth": "1999-05-05T00:00:00"}
   ]
}

And then I did this query (it is only a bool filter, but I guess making it work for your own use case should not prove too difficult)

GET /so_test/_search
{
  "query":{
    "bool": {
      "filter": {
        "script": {
          "script": {
            "lang": "painless",
            "source": """
            def flag = false;
            
            // /!\ notice how the field is referenced /!\
            if(doc['members.country'].size() != 0)
            {
              for (item in doc['members.country']) {
                if (item == params.country){
                  flag = true
                }
              } 
            }
            return flag;
            """,
            "params": {
              "country": "Japan"
            }
          }
        }
      }
    }
  }
}

BTW you were saying you were a bit confused about the context for painless. you can find in the documentation so details about it. [doc]

In this case the filter context is the one we want to look at.

Upvotes: 4

Related Questions