Russell Coker
Russell Coker

Reputation: 38

How do I get jq to return unique results when json has multiple identical entries?

jq '
  .[]|select(.accountEnabled==true)|select(.assignedPlans[].service=="exchange" and .assignedPlans[].capabilityStatus=="Enabled").proxyAddresses[]'

Below is a sample of json, it's the output of "az ad user list" (getting the Active Directory userlist from Azure) anonymised and with irrelevant things removed. Above is a jq command that I want to use to extract email addresses, the desired output is "SMTP:[email protected]" printed once not 9 times. Yes, I know I could pipe this to the Unix command "sort -u" but I'd like to do other json queries on it.

[
  {
    "accountEnabled": true,
    "assignedPlans": [
      {
        "capabilityStatus": "Enabled",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Enabled",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Enabled",
        "service": "exchange"
      }
    ],
    "provisionedPlans": [
      {
        "capabilityStatus": "Enabled",
        "provisioningStatus": "Success",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Enabled",
        "provisioningStatus": "Success",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Enabled",
        "provisioningStatus": "Success",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Enabled",
        "provisioningStatus": "Success",
        "service": "exchange"
      }
    ],
    "proxyAddresses": [
      "SMTP:[email protected]"
    ]
  },
  {
    "accountEnabled": true,
    "assignedPlans": [
      {
        "capabilityStatus": "Deleted",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Deleted",
        "service": "OfficeForms"
      }
    ],
    "provisionedPlans": [
      {
        "capabilityStatus": "Deleted",
        "provisioningStatus": "Success",
        "service": "SharePoint"
      },
      {
        "capabilityStatus": "Deleted",
        "provisioningStatus": "Success",
        "service": "exchange"
      },
      {
        "capabilityStatus": "Deleted",
        "provisioningStatus": "Success",
        "service": "exchange"
      }
    ],
    "proxyAddresses": [
      "smtp:[email protected]",
      "smtp:[email protected]",
      "SMTP:[email protected]"
    ]
  }
]

Upvotes: 1

Views: 890

Answers (2)

peak
peak

Reputation: 116690

Above is a jq command that I want to use

The following response focuses on the above requirement.

unique/0 could be used if you don't mind the fact that it sorts its input. This filter expects an array as input, and so you could modify your query as follows:

[.[]
 | select(.accountEnabled==true)
 | select(.assignedPlans[].service=="exchange" and .assignedPlans[].capabilityStatus=="Enabled")
 | .proxyAddresses[]]
| unique

This produces an array, so if you want a stream, simply tack on [] at the end.

A stream-oriented approach

Under some circumstances, it may be desirable to avoid the sort that unique/0 uses. Here is a stream-oriented solution using a generic filter, uniques/1, which involves no sorting and which has other potential advantages, though it is a bit tricky to define because it puts no restrictions on the stream.

def uniques(stream):
  foreach stream as $s ({};
     ($s|type) as $t
     | (if $t == "string" then $s else ($s|tostring) end) as $y
     | if .[$t][$y]
       then .emit = false
       else .emit = true | (.item = $s) | (.[$t][$y] = true)
       end;
     if .emit then .item else empty end );

Using uniques/1, a small tweak to the previous solution is sufficient:

uniques(.[]
 | select(.accountEnabled==true)
 | select(.assignedPlans[].service=="exchange" and .assignedPlans[].capabilityStatus=="Enabled")
 | .proxyAddresses[] )

Upvotes: 1

peak
peak

Reputation: 116690

Perhaps the problem is that the given jq query is simply "wrong" in that it does not capture the OP's intent.

Even if the following query does not reflect the OP's intent, it is worth noting that, with the given JSON, it produces the single result that is wanted:

.[]
| select(.accountEnabled==true)
| select(any(.assignedPlans[];
             .service=="exchange" and
             .capabilityStatus=="Enabled"))
| .proxyAddresses[]

Likewise ....

Here's another query with different semantics but which, with the given JSON, also produces the single desired result. (It goes to show that a single example by itself is no substitute for requirements.)

.[]
 | select(.accountEnabled==true)
 | select(any(.assignedPlans[]; .service=="exchange"))
 | select(any(.assignedPlans[]; .capabilityStatus=="Enabled"))
 | .proxyAddresses[]

Upvotes: 1

Related Questions