Reputation: 1894
I have a list of yaml files, each describing a project, with a key sdgs
which contains a list of numbers representing the Sustainable Development Goals.
I would like to merge all files and convert them to json in a different format, with sdg indexes as key, and related projects as list values.
Input:
---
# gnu_health.yaml
description: >
GNU Health is a Free/Libre project for health practitioners, health
institutions and governments. It provides the functionality of Electronic
Medical Record (EMR), Hospital Management (HMIS) and Health Information
System (HIS).
sdgs: [3]
name: GNU Health
---
# a11y.yaml
description: >
This Accessibility Project is a community-driven effort to make web
accessibility easier by leveraging a worldwide community of developer
knowledge.
sdgs: [10]
name: A11Y
---
# bahmni.yaml
description: >
Bahmni is an Open Source hospital Management System focusing
on poor/underserved and public hospitals in the developing
world.
It's aimed to being a generic system which can be used for
multiple diseases and hospitals in different countries.
sdgs: [1, 3]
name: Bahmni
Expected Output:
{
"1": [
{
"name": "Bahmni",
"description: "..."
}
],
"3": [
{
"name": "GNU Health",
"description: "..."
},
{
"name": "Bahmni",
"description: "..."
}
],
"10": [
{
"name: "A11Y",
"description: "..."
}
]
}
I'm finding it surprisingly difficult to figure this out using jq's filtering system, even after reading the manual and other awesome-jq resources.
Can someone point me to the right direction?
Current best effort:
# use as follow: yq -f $binDir/concat_sdgs.jq $srcDir/*.y*ml
# concat_sdgs.jq
{
(.sdgs[]|tostring): [.]
}
This will not merge projects from the same sdg together unfortunately
Current incorrect output:
{
"1": [
{
"name": "Bahmni",
"description: "..."
}
],
"3": [
{
"name": "GNU Health",
"description: "..."
}
],
"3": [
{
"name": "Bahmni",
"description: "..."
}
],
"10": [
{
"name: "A11Y",
"description: "..."
}
]
}
Upvotes: 1
Views: 529
Reputation: 116670
The good news is that you're close.
For simplicity, I'm going to assume that the .yaml conversion to .json has already been done. Slightly adapting your filter, it's easy to see that:
jq '{ (.sdgs[]|tostring): del(.sdgs) }' a11y.json gnu_health.json bahmni.json
produces a stream of four single-key objects corresponding closely to what you want.
To combine them into a single object is a tiny bit tricky. To keep things simple, let's first define a helper function which can be used to group single-key objects by key:
def group_by_keys: reduce .[] as $o ({};
reduce ($o | to_entries[]) as $kv (.; .[$kv.key]
Next, we'll use inputs
with the -n command line option:
jq -n '
def group_by_keys: reduce .[] as $o ({};
reduce ($o | to_entries[]) as $kv (.; .[$kv.key] += [$kv.value]));
[inputs | {(.sdgs[]|tostring): del(.sdgs) }] | group_by_keys
' a11y.json gnu_health.json bahmni.json
(Don't forget -n
.)
If the ordering of the keys is important, then simply use this filter:
def sort_by_keys:
to_entries
| sort_by(.key|tonumber)
| from_entries;
Upvotes: 3