Nacho
Nacho

Reputation: 1124

Using jq to parse keys present in two lists (even though it might not exist in one of those)

(It was hard to come up with a title that summarizes the issue, so feel free to improve it).

I have a JSON file with the following content:

{
    "Items": [
        {
            "ID": {
                "S": "ID_Complete"
            }, 
            "oldProperties": {
                "L": [
                    {
                        "S": "[property_A : value_A_old]"
                    }, 
                    {
                        "S": "[property_B : value_B_old]"
                    }
                ]
            },
            "newProperties": {
                "L": [
                    {
                        "S": "[property_A : value_A_new]"
                    }, 
                    {
                        "S": "[property_B : value_B_new]"
                    }
                ]
            }
        }, 
        {
            "ID": {
                "S": "ID_Incomplete"
            }, 
            "oldProperties": {
                "L": [
                    {
                        "S": "[property_B : value_B_old]"
                    }
                ]
            },
            "newProperties": {
                "L": [
                    {
                        "S": "[property_A : value_A_new]"
                    }, 
                    {
                        "S": "[property_B : value_B_new]"
                    }
                ]
            }
        }
    ]
}

I would like to manipulate the data using jq in such a way that for each item in Items[] that has a new value for property_A (under newProperties list) generate an output with the corresponding id, old and new (see desired output below) fields regardless of the value that property has in the oldProperties list. Moreover, if property_A does not exist in the oldProperties, I still need the old field to be populated with a null (or any fixed string for what it's worth).

Desired output:

{
  "id": "id_Complete",
  "old": "[property_A : value_A_old]",
  "new": "[property_A : value_A_new]"
}
{
  "id": "ID_Incomplete",
  "old": null,
  "new": "[property_A : value_A_new]"
}

Note: Even though property_A doesn't exist in the oldProperties list, other properties may (and will) exist.

The problem I am facing is that I am not able to get an output when the desired property does not exist in the oldProperties list. My current jq command looks like this:

jq -r '.Items[] | 
    { id:.ID.S, 
      old:.oldProperties.L[].S | select(. | contains("property_A")),
      new:.newProperties.L[].S | select(. | contains("property_A")) }'

Which renders only the ID_Complete case, while I need the other as well.

Is there any way to achieve this using this tool?

Thanks in advance.

Upvotes: 3

Views: 264

Answers (2)

Jeff Mercado
Jeff Mercado

Reputation: 134811

Your list of properties appear to be values of some object. You could map them out into an object to then diff the objects, then report on the results.

You could do something like this:

def make_object_from_properties:
      [.L[].S | capture("\\[(?<key>\\w+) : (?<value>\\w+)\\]")]
    | from_entries
    ;
def diff_objects($old; $new):
      def _prop($key): select(has($key))[$key];
      ([($old | keys[]), ($new | keys[])] | unique) as $keys
    | [   $keys[] as $k
        | ({ value: $old | _prop($k) } // { none: true }) as $o
        | ({ value: $new | _prop($k) } // { none: true }) as $n
        | (if   $o.none                 then "add"
          elif  $n.none                 then "remove"
          elif  $o.value != $n.value    then "change"
                                        else "same"
          end) as $s
        | { key: $k, status: $s, old: $o.value, new: $n.value }
      ]
  ;
def diff_properties:
      (.oldProperties | make_object_from_properties) as $old
    | (.newProperties | make_object_from_properties) as $new
    | diff_objects($old; $new) as $diff
    | foreach $diff[] as $d ({ id: .ID.S };
          select($d.status != "same")
        | .old = ((select(any("remove", "change"; . == $d.status)) | "[\($d.key) : \($d.old)]") // null)
        | .new = ((select(any("add", "change";    . == $d.status)) | "[\($d.key) : \($d.new)]") // null)
      )
    ;
[.Items[] | diff_properties]

This yields the following output:

[
  {
    "id": "ID_Complete",
    "old": "[property_A : value_A_old]",
    "new": "[property_A : value_A_new]"
  },
  {
    "id": "ID_Complete",
    "old": "[property_B : value_B_old]",
    "new": "[property_B : value_B_new]"
  },
  {
    "id": "ID_Incomplete",
    "old": null,
    "new": "[property_A : value_A_new]"
  },
  {
    "id": "ID_Incomplete",
    "old": "[property_B : value_B_old]",
    "new": "[property_B : value_B_new]"
  }
]

It seems like your data is in some kind of encoded format too. For a more robust solution, you should consider defining some functions to decode them. Consider approaches found here on how you could do that.

Upvotes: 2

jq170727
jq170727

Reputation: 14645

This filter produces the desired output.

def parse: capture("(?<key>\\w+)\\s*:\\s*(?<value>\\w+)") ;
def print: "[\(.key) : \(.value)]";
def norm:   [.[][][] | parse | select(.key=="property_A") | print][0];

  .Items
| map({id:.ID.S, old:.oldProperties|norm, new:.newProperties|norm})[]

Sample Run (assumes filter in filter.jq and data in data.json)

$ jq -M -f filter.jq data.json
{
  "id": "ID_Complete",
  "old": "[property_A : value_A_old]",
  "new": "[property_A : value_A_new]"
}
{
  "id": "ID_Incomplete",
  "old": null,
  "new": "[property_A : value_A_new]"
}

Try it online!

Upvotes: 1

Related Questions