MasterJoe
MasterJoe

Reputation: 2335

JsonPath - Extract object meeting multiple criteria?

In the Json string given below, I want to find all elements in which category = m AND the "middle" array contains elements which match this condition - the element's "middle" array has objects whose itemType = Executable.

I would like to use jsonpath to get the desired objects. I prefer to not use jmespath because it can be too complex for my purpose. But, I am new to jsonpath and I am not able to figure out the json query from online tutorials which are too trivial or basic. I wonder if its better to use a programming language instead to get the data I need. Please advise.

So far, I was able to only extract elements in which category = m by using this jsonpath query $.[?(@.category=="m")]. How do I do the remaining part ?

Json : Overview - Every object has a "content" object. Each content object generally has a start, middle and end array besides other fields. Middle arrays can have multiple content objects inside them and so on. Some of the content objects have only a middle array. I am interested in locating items in such content objects as mentioned above.

Note that this is not the actual json which I have to process. It is an imitation which has been sanitized for SO.

{
  "id": "123",
  "contents": {
    "title": "B1",
    "start": [],
    "middle": [
      {
        "level": "1",
        "contents": {
          "title": "C1",
          "category": "c",
          "start": [],
          "middle": [
            {
              "level": "2",
              "contents": {
                "title": "M1",
                "category": "m",
                "start": [],
                "middle": [
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT1",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT2",
                      "middle": [
                        {
                          "itemType": "Executable",
                          "id": "exec1"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT3",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  }
                ],
                "end": []
              }
            },
            {
              "level": "2",
              "contents": {
                "title": "M2",
                "category": "m",
                "start": [],
                "middle": [
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT1",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT2",
                      "middle": [
                        {
                          "itemType": "Executable",
                          "id": "exec2"
                        }
                      ]
                    }
                  }
                ],
                "end": []
              }
            }
          ],
          "end": []
        }
      },
      {
        "level": "1",
        "contents": {
          "title": "C2",
          "category": "c",
          "start": [],
          "middle": [
            {
              "level": "2",
              "contents": {
                "title": "M1",
                "category": "m",
                "start": [],
                "middle": [
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT1",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT2",
                      "middle": [
                        {
                          "itemType": "Executable",
                          "id": "exec3"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT3",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  }
                ],
                "end": []
              }
            },
            {
              "level": "2",
              "contents": {
                "title": "M2",
                "category": "m",
                "start": [],
                "middle": [
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT1",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT2",
                      "middle": [
                        {
                          "itemType": "Executable",
                          "id": "exec4"
                        }
                      ]
                    }
                  },
                  {
                    "level": "3",
                    "contents": {
                      "title": "MAT3",
                      "middle": [
                        {
                          "itemType": "Data"
                        }
                      ]
                    }
                  }
                ],
                "end": []
              }
            }
          ],
          "end": []
        }
      }
    ],
    "end": []
  }
}

Upvotes: 1

Views: 3527

Answers (1)

dreftymac
dreftymac

Reputation: 32370

Context

  • json with nested objects1
  • jsonpath expression language
  • choosing between jsonpath and jmespath (or other JSON expression engine)

Problem

  • DeveMasterJoe2 wants to extract some values from nested JSON

Discussion

  • There are lots of implementations of jsonpath out there, and they do not all support the same features
  • The structure and normalization of the source JSON is going to influence how easily this can be done with pure jsonpath
  • In choosing a JSON expression engine, one has to weigh multiple factors
    • how consistent are the implementations across languages?
    • how many choices are there within a given language?
    • how clear is the specification?
    • how many examples, unit-tests or tutorials are available?
    • who is supporting it?

Example solution using Python and jsonpath-ng

  • Here is an example solution using python 3.7 and jsonpath-ng
  • This example uses a mix of jsonpath and python instead of just pure jsonpath, because of the heavily-nested JSON
    • I will leave it for someone else to provide an answer that relies on pure jsonpath
  • Note that the source JSON arguably could stand to be cleaned up a bit
    • (for example, why is there no id field attached to itemType==Data elements?)
    • (for example, why is category not found on all contents elements?)
    • (for example, if you expressly specify level why complicate things with heavily nested objects when you can determine depth by level ?)

This example:

## import libraries
import codecs
import json
import jsonpath_ng
from jsonpath_ng.ext import parse
##;;

## init vars
href="path/to/my/jsonfile/nested_dict.json"
json_string   = codecs.open(href, 'rb', encoding='utf8').read()
json_dataroot = json.loads(json_string)
final_result  = []
##;;

## init jsonpath outer-query
match         = parse('$..contents.middle[*]').find(json_dataroot)
##;;

## iterate through outer-query and gather subelements
for ijj,item in enumerate(match):
  ## restrict to desired category == 'm'
  if(match[ijj].value.get('contents',{}).get('category','') == 'm'):
    ## extract out desired subelements
    json_datafrag001  = [item.get('contents',{}).get('middle',{})[0] 
              for item in match[ijj].value.get('contents',{}).get('middle',{})
              ]
    match001 = parse("$[?(@.itemType=='Executable')]").find(json_datafrag001)
    final_result.extend(list(match001[ikk].value for ikk,item in enumerate(match001)))
pass
##;;

## show final result
vout = json.dumps(final_result, sort_keys=True,indent=4, separators=(',', ': '))
print(vout)
##;;

... produces this result ...

[
    {
        "id": "exec1",
        "itemType": "Executable"
    },
    {
        "id": "exec2",
        "itemType": "Executable"
    },
    {
        "id": "exec3",
        "itemType": "Executable"
    },
    {
        "id": "exec4",
        "itemType": "Executable"
    }
]

1 (aka dictionary, associative-array, hash)

Upvotes: 2

Related Questions