Matt
Matt

Reputation: 7249

how to group json using jq and then convert to yaml

I have this json that i want to convert.

[
  {
    "externalGroup": "another group admins",
    "groupId": "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
  },
  {
    "externalGroup": "another group users",
    "groupId": "7c69cac1-4a70-4170-8251-cde3762fe498"
  },
  {
    "externalGroup": "my group admin",
    "groupId": "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
  },
  {
    "externalGroup": "my group users",
    "groupId": "8370821e-edfa-4615-ac2e-47815b740f40"
  },
  {
    "externalGroup": "some group",
    "groupId": "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
  },
  {
    "externalGroup": "some group",
    "groupId": "8370821e-edfa-4615-ac2e-47815b740f40"
  },
  {
    "externalGroup": "some group",
    "groupId": "7c69cac1-4a70-4170-8251-cde3762fe498"
  }
]

I have tried this, which is pretty close: jq '. | group_by(.externalGroup)[] | {(.[0].externalGroup): map(.groupId)}'

I get this:

{
  "another group admins": [
    "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
  ]
}
{
  "another group users": [
    "7c69cac1-4a70-4170-8251-cde3762fe498"
  ]
}
{
  "my group admin": [
    "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
  ]
}
{
  "my group users": [
    "8370821e-edfa-4615-ac2e-47815b740f40"
  ]
}
{
  "some group": [
    "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
    "8370821e-edfa-4615-ac2e-47815b740f40",
    "7c69cac1-4a70-4170-8251-cde3762fe498"
  ]
}

But this doesn't convert properly with yq. It would need to look something like this instead:

{
  "another group admins": [
    "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
  ],
  "another group users": [
    "7c69cac1-4a70-4170-8251-cde3762fe498"
  ],
  "my group admin": [
    "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
  ],
  "my group users": [
    "8370821e-edfa-4615-ac2e-47815b740f40"
  ],
  "some group": [
    "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
    "8370821e-edfa-4615-ac2e-47815b740f40",
    "7c69cac1-4a70-4170-8251-cde3762fe498"
  ]
}

In order to get something like:

"another group admins":
  - "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
"another group users":
  - "7c69cac1-4a70-4170-8251-cde3762fe498"
"my group admin":
  - "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
"my group users":
  - "8370821e-edfa-4615-ac2e-47815b740f40"
"some group":
  - "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
  - "8370821e-edfa-4615-ac2e-47815b740f40",
  - "7c69cac1-4a70-4170-8251-cde3762fe498"

Upvotes: 2

Views: 924

Answers (2)

peak
peak

Reputation: 116670

  1. An alternative worth considering for producing yaml is gojq, the Go implementation of jq, e.g.
   gojq --yaml-output '
     group_by(.externalGroup) 
    | map({(.[0].externalGroup):map(.groupId)}) | add'
  1. To avoid the overhead of map, you could use the following generic stream-oriented add that works for objects or arrays just as well as for numbers:
    gojq --yaml-output '
      def add(s): reduce s as $x (null; . + $x);
      add( group_by(.externalGroup)[] 
           | {(.[0].externalGroup):map(.groupId)})'

Upvotes: 0

Weeble
Weeble

Reputation: 17890

The piece you are missing is from_entries which can build a JSON object from an array of keys and values.

Instead of:

jq '. | group_by(.externalGroup)[] | {(.[0].externalGroup): map(.groupId)}'

Try:

jq 'group_by(.externalGroup) | map({key:.[0].externalGroup, value:map(.groupId)}) | from_entries'
{
  "another group admins": [
    "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
  ],
  "another group users": [
    "7c69cac1-4a70-4170-8251-cde3762fe498"
  ],
  "my group admin": [
    "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
  ],
  "my group users": [
    "8370821e-edfa-4615-ac2e-47815b740f40"
  ],
  "some group": [
    "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
    "8370821e-edfa-4615-ac2e-47815b740f40",
    "7c69cac1-4a70-4170-8251-cde3762fe498"
  ]
}

I made the following changes:

  • Removed the . | at the beginning because it doesn't change anything.
  • Removed the [] and used map(...) instead, because we want to keep things in an array to feed to from_entries.
  • Instead of assembling a one-entry object, we create {key:..., value:...} pairs to feed to from_entries.

Actually, I just checked and was slightly surprised to discover that add is actually a bit faster than from_entries even for very long lists. If you use add you need to change even less of your solution.

jq 'group_by(.externalGroup) | map({(.[0].externalGroup):map(.groupId)}) | add'

Adding together objects combines their contents together. I tested with a 250,000 element list and it was slightly faster than from_entries. Given that it's also shorter and in my opinion pretty much just as clear, I think it's worthy of consideration.

Upvotes: 3

Related Questions