Sam Morgan
Sam Morgan

Reputation: 23

JQ Recursive Tree expansion

I am attempting to parse a JSON structure to extract a dependency path, for use in an automation script.

The structure of this JSON is extracted to a format like this:

[
  {
    "Id": "abc",
    "Dependencies": [
    ]
  },
  {
    "Id": "def",
    "Dependencies": [
      "abc"
    ]
  },
  {
    "Id": "ghi",
    "Dependencies": [
      "def"
    ]
  }
]

Note: Lots of other irrelevant fields removed.

The plan is to be able to pass into my JQ command the Id of one of these and get back out a list.

Eg:

Input: abc
Expected Output: []

Input: def
Expected Output: ["abc"]

Input: ghi
Expected Output: ["abc", "def"]

Currently have a jq script like this (https://jqplay.org/s/NAhuXNYXXO):

    jq 
    '. as $original | .[] | 
    select(.Id == "INPUTVARIABLE") | 
    [.Dependencies[]] as $level1Dep | [$original[] | select( [ .Id == $level1Dep[] ] | any )] as $level1Full | $level1Full[] | 
    [.Dependencies[]] as $level2Dep | [$original[] | select ( [ .Id == $level2Dep[] ] | any )] as $level2Full | 
    [$level1Dep[], $level2Dep[]]'

Input: abc
Output: empty

Input: def Output: ["abc"]

Input: ghi Output: ["def","abc"]

Great! However, as you can see this is not particularly scale-able and will only handle two dependency levels (https://jqplay.org/s/Zs0xIvJ2Zn), and also falls apart horribly when there are multiple dependencies on an item (https://jqplay.org/s/eB9zHQSH2r).

Is there a way of constructing this within JQ or do I need to move out to a different language?

Upvotes: 2

Views: 742

Answers (2)

peak
peak

Reputation: 116750

Here's a short program that handles circular dependencies efficiently and illustrates how a subfunction can be defined after the creation of a local variable (here, $next) for efficiency:

def dependents($x):
  (map( {(.Id): .Dependencies}) | add) as $next
  # Input: array of dependents computed so far
  # Output: array of all dependents
  | def tc($x):
    ($next[$x] - .) as $new
    | if $new == [] then .
      else (. + $new | unique)
      # avoid calling unique again:
      | . + ([tc($new[])[]] - .)
      end ;
  [] | tc($x);

  dependents($start)

Usage

With the given input and an invocation such as

jq --arg start START -f program.jq input.json

the output for various values of START is:

START output
abc   []
def   ["abc"]
ghi   ["def", "abc"]

If the output must be sorted, then simply add a call to sort.

Upvotes: 1

oguz ismail
oguz ismail

Reputation: 50760

I know that the data cannot have circular dependencies, it is pulled from a database that enforces this.

It's trivial then. Reduce your input JSON down to an object where each Id and corresponding Dependencies array are paired, and walk through it aggregating dependencies using a recursive function.

def deps($depdb; $id):
  def _deps($id): $depdb[$id] // empty
    | . + map(_deps(.)[]);
  _deps($id);
deps(map({(.Id): .Dependencies}) | add; $fid)

Invocation:

jq -c --arg fid 'ghi' -f prog.jq file

Online demo - arbitrary dependency levels
Online demo - multiple dependencies per Id

Upvotes: 2

Related Questions