Thai Q Pham
Thai Q Pham

Reputation: 35

Find common keys in JSON objects using jq

I'm trying to find all common keys in a Json file, given that we don't know names of keys in the file.

the Json file looks like:

{
   "DynamicKey1" : {
    "foo" : 1,
    "bar" : 2

   },
   "DynamicKey2" : {
     "bar" : 3

   },
   "DynamicKey3" : {
     "foo" : 5,
     "zyx" : 5

   }   
}

Expect result:

{
 "foo"
}

I was trying to apply reduce/foreach logic here but I am not sure how to write it in jq. I appreciate any help!!

jq '. as $ss | reduce range(1; $ss|length) as $i ([]; . + reduce ($ss[i] | keys) as $key ([]; if $ss[$i - 1] | has($key) then . +$key else . end))' file.json

Upvotes: 1

Views: 1200

Answers (2)

peak
peak

Reputation: 116750

Here is a sort-free and time-efficient answer that relies on the efficiency of jq's implementation of lookups in a JSON dictionary. Since keys are strings, we can simply use the concept of a "bag of words" (bow):

def bow(stream): 
  reduce stream as $word ({}; .[$word|tostring] += 1);

We can now solve the "Keys common to all objects" problem as follows:

length as $length
| bow(.[] | keys_unsorted[])
| to_entries[]
| select(.value==$length).key

And similarly for the "Keys in more than one object" problem.

Of course, to achieve the time-efficiency, there is the usual space-time tradeoff.

Upvotes: 0

peak
peak

Reputation: 116750

There are some inconsistencies in the Q as posted: there are no keys common to all the objects, and if one looks at the pair-wise intersection of keys, the result would include both "foo" and "bar".

In the following, I'll present solutions for both these problems.

Keys in more than one object

[.[] | keys_unsorted[]] | group_by(.)[] | select(length>1)[0]

Keys in all the objects

Here's a solution using a similar approach:

length as $length
| [.[] | keys_unsorted[]] | group_by(.)[]
| select(length==$length) 
| .[0]

This involves group_by/2, which is implemented using a sort.

Here is an alternative approach that relies on the built-in function keys to do the sorting (the point being that ((nk ln(nk)) - n(k ln(k))) = nk ln(n), i.e. having n small sorts of k items is better than one large sort of n*k items):

# The intersection of an arbitrary number of sorted arrays
def intersection_of_sorted_arrays:
  # intersecting/1 returns a stream
  def intersecting($A;$B):
    def pop:
    .[0] as $i
    | .[1] as $j
    | if $i == ($A|length) or $j == ($B|length) then empty
      elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
      elif $A[$i] <  $B[$j] then [$i+1, $j] | pop
      else [$i, $j+1] | pop
      end;
    [0,0] | pop;
   reduce .[1:][] as $x (.[0]; [intersecting(.; $x)]);

To compute the keys common to all the objects:

[.[] | keys] | intersection_of_sorted_arrays

Upvotes: 2

Related Questions