zrvan
zrvan

Reputation: 7743

Elegant way to select nested objects with the associated key based on a specific criteria

Given an example document in JSON similar to this:

{
  "id": "post-1",
  "type": "blog-post",
  "tags": [
    {
      "id": "tag-1",
      "name": "Tag 1"
    },
    {
      "id": "tag-2",
      "name": "Tag 2"
    }
  ],
  "heading": "Post 1",
  "body": "this is my first blog post",
  "links": [
    {
      "id": "post-2",
      "heading": "Post 2",
      "tags": [
        {
          "id": "tag-1",
          "name": "Tag 1"
        },
        {
          "id": "tag-3",
          "name": "Tag 3"
        }
      ]
    }
  ],
  "metadata": {
    "user": {
      "social": [
        {
          "id": "twitter",
          "handle": "@user"
        },
        {
          "id": "facebook",
          "handle": "123456"
        },
        {
          "id": "youtube",
          "handle": "ABC123xyz"
        }
      ]
    },
    "categories": [
      {
        "name": "Category 1"
      },
      {
        "name": "Category 2"
      }
    ]
  }
}

I would like to select any object (regardless of depth) that has an attribute "id", as well as the attribute name of the parent object. The above example should be taken as just that, an example. The actual data, that I'm not at liberty to share, can have any depth and just about any structure. Attributes can be introduced and removed at any time. Using the Blog Post style is just because it is quite popular for examples and I have very limited imagination.

The attribute signifies a particular type within the domain, that might also be (but is not necessarily) coded into the value of the attribute.

If an object does not have the "id" attribute it is not interesting and should not be selected.

A very important special case is when the value of an attribute is an array of objects, in that case I need to keep the attribute name and associate it with each element in the array.

An example of the desired output would be:

[
  {
    "type": "tags",
    "node": {
      "id": "tag-1",
      "name": "Tag 1"
    }
  },
  {
    "type": "tags",
    "node": {
      "id": "tag-2",
      "name": "Tag 2"
    }
  },
  {
    "type": "links",
    "node": {
      "id": "post-2",
      "heading": "Post 2",
      "tags": [
        {
          "id": "tag-1",
          "name": "Tag 1"
        },
        {
          "id": "tag-3",
          "name": "Tag 3"
        }
      ]
    }
  },
  {
    "type": "tags",
    "node": {
      "id": "tag-1",
      "name": "Tag 1"
    }
  },
  {
    "type": "tags",
    "node": {
      "id": "tag-3",
      "name": "Tag 3"
    }
  },
  {
    "type": "social",
    "node": {
      "id": "twitter",
      "handle": "@user"
    }
  },
  {
    "type": "social",
    "node": {
      "id": "facebook",
      "handle": "123456"
    }
  },
  {
    "type": "social",
    "node": {
      "id": "youtube",
      "handle": "ABC123xyz"
    }
  }
]

It isn't strictly necessary that the output is identical, order for instance is irrelevant for my use-case it could be grouped as well. Since the top level object has an attribute "id" it could be included with a special name, but I'd prefer if it was not included at all.

I've tried to use walk, reduce and recurse to no avail, I'm afraid my jq skills are too limited. But I imagine that a good solution would make use of at least one of them.

I would like an expression something like

to_entries[] | .value | .. | select(has("id")?)

which would select the correct objects, but with .. I'm no longer able to keep the associated attribute name.

The best I've come up with is

. as $document
| [paths | if length > 1 and .[-1] == "id" then .[0:-1] else empty end] 
| map(. as $path 
      | $document 
      | { "type": [$path[] | if type == "string" then . else empty end][-1],
           "node": getpath($path) })

Which works, but feels quite complicated and involves first extracting all paths, ignoring any path that does not have "id" as the last element, then remove the "id" segment to get the path to the actual object and storing the (by now last) segment that is a string, which corresponds to the parent objects attribute containing the interesting object. Finally the actual object is selected through getpath.

Is there a more elegant, or at the least shorter way to express this?

I should note that I'd like to use jq for the convenience of having bindings to other languages as well as being able to run the program on the command line.

For the scope of this question, I'm not really interested in alternatives to jq as I can imagine how to solve this differently using other tooling, but I would really like to "just" use jq.

Upvotes: 2

Views: 561

Answers (2)

peak
peak

Reputation: 116740

Since the actual requirements aren’t clear to me, I’ll assume that the given implementation defines the functional requirements, and propose a shorter and hopefully sleeker version:

. as $document
| paths
| select(length > 1 and .[-1] == "id")
| .[0:-1] as $path
| { "type": last($path[] | strings),
    "node": $document | getpath($path) }

This produces a stream, so if you want an array, you could simply enclose the above in square brackets.

last(stream) emits null if the stream is empty, which accords with the behavior of .[-1].

Upvotes: 1

Jeff Mercado
Jeff Mercado

Reputation: 134851

This works:

[
    foreach (paths | select(.[-1] == "id" and length > 1)[:-1]) as $path ({i:.};
        .o = {
            type: last($path[] | strings),
            node: (.i | getpath($path))
        };
        .o
    )
]

The trick is to know that any numbers in the path indicates the value is part of an array. You'll have to adjust the path to get the parent name. But using last/1 with a string filter makes it simpler.

Upvotes: 1

Related Questions