Arpit Agarwal
Arpit Agarwal

Reputation: 348

Merge two array of objects with common key using jq command

I have two datasets:

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

I want the output to be:

combined='[
    { "errors": 51, bookings: 2984, "timestamp": 1675854900 },
    { "errors": 90, bookings: 2967, "timestamp": 1675855200 }
]'

Can this be achieved by shell scripting and jq command?

Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.

Upvotes: 0

Views: 102

Answers (5)

pmf
pmf

Reputation: 36033

This last paragraph just caught my attention:

Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.

If this is truly the case then it is reasonable to assume that both arrays have the same length and their items are aligned respectively. Thus, there's no need to build up a hash-based INDEX as accessing the items by their numeric keys (positions within the arrays) can already be achieved in constant time.

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  $data1 | [keys[] | $data2[.] + $data1[.]]
'
[
  {
    "errors": 51,
    "timestamp": 1675854900,
    "bookings": 2984
  },
  {
    "errors": 90,
    "timestamp": 1675855200,
    "bookings": 2967
  }
]

Upvotes: 1

peak
peak

Reputation: 116690

In general, if you find JOIN a bit tricky to understand or use, then consider using INDEX for this type of problem. In the present case, you could get away with a trivially simple approach, e.g.:

jq -n --argjson data1 "$data1"  --argjson data2 "$data2" '
  INDEX($data1[]; .timestamp) as $dict
  | $data2 | map( . + $dict[.timestamp|tostring])

Upvotes: 1

pmf
pmf

Reputation: 36033

A simple JOIN operation could do:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'
[
  {
    "errors": 51,
    "timestamp": 1675854900,
    "bookings": 2984
  },
  {
    "errors": 90,
    "timestamp": 1675855200,
    "bookings": 2967
  }
]

I'm getting this error: jq: error: JOIN/4 is not defined at <top-level>, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error

You are probably using an older version of jq. JOIN and INDEX were introduced in jq 1.6. Either define them yourself by taking their definitions from source, or take those definitions and modify them to fit your very use case (both work well with jq 1.5).

Definitions from source:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  def INDEX(stream; idx_expr):
    reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
  def JOIN($idx; stream; idx_expr; join_expr):
    stream | [., $idx[idx_expr]] | join_expr;

  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'

Adapted to your use case:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  ($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
  | $data2 | map(. + $ix[.timestamp | @text])
'

Upvotes: 1

Arpit Agarwal
Arpit Agarwal

Reputation: 348

By the way, I have trying to this answer from AI since morning and finally it also gave me correct solution this time

#!/bin/bash

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

combined=$(jq -n --argjson d1 "$data1" --argjson d2 "$data2" '
  [ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
    reduce .[] as $i ({}; . * $i)
  )
')

echo "$combined"

Just pasting it here for you guys in case you didn't think of this method

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295288

Another way to do this is to build a map from timestamps to error counts, and perform a lookup in it.

jq -n '
  input as $data1
| input as $data2
| ($data2
   | map({ "key": (.timestamp | tostring), "value": .errors }) 
   | from_entries
  ) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
' <<<"$data1 $data2"

Upvotes: 0

Related Questions