Reputation: 326
I'm trying to parse a json file in order to create a deletion list for an artifactory instance.
I'd like group them by two fields: repo and path. And then keep the two objects for each grouping with the most recent "modified" timestamp and delete all the other objects in the json file.
So, an original file that looks like this:
{
"results": [
{
"repo": "repo1",
"path": "docker_image_dynamic",
"size": 3624,
"modified": "2016-10-01T06:22:16.335Z"
},
{
"repo": "repo1",
"path": "docker_image_dynamic",
"size": 3646,
"modified": "2016-10-01T07:03:58.465Z"
},
{
"repo": "repo1",
"path": "docker_image_dynamic",
"size": 3646,
"modified": "2016-10-01T07:06:36.522Z"
},
{
"repo": "repo2",
"path": "docker_image_static",
"size": 3624,
"modified": "2016-09-29T20:31:44.054Z"
}
]
}
should become this:
{
"results": [
{
"repo": "repo1",
"path": "docker_image_dynamic",
"size": 3646,
"modified": "2016-10-01T07:03:58.465Z"
},
{
"repo": "repo1",
"path": "docker_image_dynamic",
"size": 3646,
"modified": "2016-10-01T07:06:36.522Z"
},
{
"repo": "repo2",
"path": "docker_image_static",
"size": 3624,
"modified": "2016-09-29T20:31:44.054Z"
}
]
}
Upvotes: 1
Views: 1881
Reputation: 116919
Comments aside, here is a more concise (and more jq-esque (*)) way to express @jq170727's solution:
.results |= [reduce .[] as $r ( {};
.[$r.repo][$r.path] |= ((.+[$r]) | sort_by(.modified)[-2:]))
| .[][]]
(*) Specifically no getpath, setpath, or $new; and |=
lessens redundancy.
Upvotes: 1
Reputation: 116919
@jq170727 makes a good point about the potential inefficiency of using group_by
, since group_by
involves sorting. In practice, the sort is probably too fast to matter, but if it is of concern, we can define our own sort-free version of group_by
very easily:
# sort-free variant of group_by/1
# f must always evaluate to a string.
# Output: an object
def GROUP_BY(f): reduce .[] as $x ({}; .[$x|f] += [$x] );
@JeffMercado's solution can now be used with the help of tojson
as follows:
.results |= [GROUP_BY({repo,path}|tojson)[] | sort_by(.modified)[-2:][]]
To avoid the call to tojson
, we can tweak the above to produce the following even faster solution:
def GROUP_BY(f;g): reduce .[] as $x ({}; .[$x|f][$x|g] += [$x]);
.results |= [GROUP_BY(.repo;.path)[][] | sort_by(.modified)[-2:][]]
Upvotes: 1
Reputation: 14715
Here is a more cumbersome solution which uses reduce to maintain a temporary object with the last two values for each repo
and path
. It's probably not better than Jeff's solution unless the input contains a large number of entries for each combination of (repo, path):
{
results: [
reduce .results[] as $r (
{} # temporary object
; (
getpath([$r.repo, $r.path]) # keep the latest two
| . + [$r] # elements for each
| sort_by(.modified)[-2:] # repo and path in a
) as $new # temporary object
| setpath([$r.repo, $r.path]; $new) #
)
| .[] | .[] | .[] # extract saved elements
]
}
Upvotes: 1
Reputation: 134571
This should do it:
.results |= [group_by({repo,path})[] | sort_by(.modified)[-2:][]]
After grouping the items in the array by repo
and path
, you sort the groups by modified
and keep the last two items of the sorted group. Then split the groups up again and collect them into a new array.
Upvotes: 2