pahool
pahool

Reputation: 326

Sorting and filtering a json file using jq

I'm trying to parse a json file in order to create a deletion list for an artifactory instance.

I'd like group them by two fields: repo and path. And then keep the two objects for each grouping with the most recent "modified" timestamp and delete all the other objects in the json file.

So, an original file that looks like this:

{
  "results": [
    {
      "repo": "repo1",
      "path": "docker_image_dynamic",
      "size": 3624,
      "modified": "2016-10-01T06:22:16.335Z"
    },
    {
      "repo": "repo1",
      "path": "docker_image_dynamic",
      "size": 3646,
      "modified": "2016-10-01T07:03:58.465Z"
    },
    {
      "repo": "repo1",
      "path": "docker_image_dynamic",
      "size": 3646,
      "modified": "2016-10-01T07:06:36.522Z"
    },
    {
      "repo": "repo2",
      "path": "docker_image_static",
      "size": 3624,
      "modified": "2016-09-29T20:31:44.054Z"
    }
  ]
}

should become this:

{
  "results": [
    {
      "repo": "repo1",
      "path": "docker_image_dynamic",
      "size": 3646,
      "modified": "2016-10-01T07:03:58.465Z"
    },
    {
      "repo": "repo1",
      "path": "docker_image_dynamic",
      "size": 3646,
      "modified": "2016-10-01T07:06:36.522Z"
    },
    {
      "repo": "repo2",
      "path": "docker_image_static",
      "size": 3624,
      "modified": "2016-09-29T20:31:44.054Z"
    }
  ]
}

Upvotes: 1

Views: 1881

Answers (4)

peak
peak

Reputation: 116919

Comments aside, here is a more concise (and more jq-esque (*)) way to express @jq170727's solution:

.results |= [reduce .[] as $r ( {};
               .[$r.repo][$r.path] |= ((.+[$r]) | sort_by(.modified)[-2:])) 
             | .[][]]

(*) Specifically no getpath, setpath, or $new; and |= lessens redundancy.

Upvotes: 1

peak
peak

Reputation: 116919

@jq170727 makes a good point about the potential inefficiency of using group_by, since group_by involves sorting. In practice, the sort is probably too fast to matter, but if it is of concern, we can define our own sort-free version of group_by very easily:

# sort-free variant of group_by/1
# f must always evaluate to a string.
# Output: an object
def GROUP_BY(f): reduce .[] as $x ({}; .[$x|f] += [$x] );

@JeffMercado's solution can now be used with the help of tojson as follows:

.results |= [GROUP_BY({repo,path}|tojson)[] | sort_by(.modified)[-2:][]]

GROUP_BY/2

To avoid the call to tojson, we can tweak the above to produce the following even faster solution:

def GROUP_BY(f;g): reduce .[] as $x ({}; .[$x|f][$x|g] += [$x]);

.results |= [GROUP_BY(.repo;.path)[][] | sort_by(.modified)[-2:][]]

Upvotes: 1

jq170727
jq170727

Reputation: 14715

Here is a more cumbersome solution which uses reduce to maintain a temporary object with the last two values for each repo and path. It's probably not better than Jeff's solution unless the input contains a large number of entries for each combination of (repo, path):

    {
      results: [
        reduce .results[] as $r (
             {}                                 # temporary object
           ; (
                getpath([$r.repo, $r.path])     # keep the latest two
              | . + [$r]                        # elements for each
              | sort_by(.modified)[-2:]         # repo and path in a
             ) as $new                          # temporary object
           | setpath([$r.repo, $r.path]; $new)  #
        )
        | .[] | .[] | .[]                       # extract saved elements
      ]
    }

Upvotes: 1

Jeff Mercado
Jeff Mercado

Reputation: 134571

This should do it:

.results |= [group_by({repo,path})[] | sort_by(.modified)[-2:][]]

After grouping the items in the array by repo and path, you sort the groups by modified and keep the last two items of the sorted group. Then split the groups up again and collect them into a new array.

Upvotes: 2

Related Questions