Adrien Lemaire
Adrien Lemaire

Reputation: 1894

Merge yaml files and re-organize by values of key value list

I have a list of yaml files, each describing a project, with a key sdgs which contains a list of numbers representing the Sustainable Development Goals.

I would like to merge all files and convert them to json in a different format, with sdg indexes as key, and related projects as list values.

Input:

---
# gnu_health.yaml
description: >
  GNU Health is a Free/Libre project for health practitioners, health
  institutions and governments. It provides the functionality of Electronic
  Medical Record (EMR), Hospital Management (HMIS) and Health Information
  System (HIS).
sdgs: [3]
name: GNU Health 

---
# a11y.yaml
description: >
  This Accessibility Project is a community-driven effort to make web
  accessibility easier by leveraging a worldwide community of developer
  knowledge.
sdgs: [10]
name: A11Y

---
# bahmni.yaml
description: >
  Bahmni is an Open Source hospital Management System focusing
  on poor/underserved and public hospitals in the developing
  world.
  It's aimed to being a generic system which can be used for
  multiple diseases and hospitals in different countries.
sdgs: [1, 3]
name: Bahmni

Expected Output:

{
  "1": [
    {
      "name": "Bahmni",
      "description: "..."
    }
  ],
  "3": [
    {
      "name": "GNU Health",
      "description: "..."
    },
    {
      "name": "Bahmni",
      "description: "..."
    }
  ],
  "10": [
    {
      "name: "A11Y",
      "description: "..."
    }
  ]
}

I'm finding it surprisingly difficult to figure this out using jq's filtering system, even after reading the manual and other awesome-jq resources.

Can someone point me to the right direction?

Current best effort:

# use as follow: yq -f $binDir/concat_sdgs.jq $srcDir/*.y*ml

# concat_sdgs.jq
{
  (.sdgs[]|tostring): [.]
}

This will not merge projects from the same sdg together unfortunately

Current incorrect output:

{
  "1": [
    {
      "name": "Bahmni",
      "description: "..."
    }
  ],
  "3": [
    {
      "name": "GNU Health",
      "description: "..."
    }
  ],
  "3": [
    {
      "name": "Bahmni",
      "description: "..."
    }
  ],
  "10": [
    {
      "name: "A11Y",
      "description: "..."
    }
  ]
}

Upvotes: 1

Views: 529

Answers (1)

peak
peak

Reputation: 116670

The good news is that you're close.

For simplicity, I'm going to assume that the .yaml conversion to .json has already been done. Slightly adapting your filter, it's easy to see that:

jq '{ (.sdgs[]|tostring): del(.sdgs) }' a11y.json gnu_health.json bahmni.json

produces a stream of four single-key objects corresponding closely to what you want.

To combine them into a single object is a tiny bit tricky. To keep things simple, let's first define a helper function which can be used to group single-key objects by key:

  def group_by_keys: reduce .[] as $o ({}; 
     reduce ($o | to_entries[]) as $kv (.; .[$kv.key]

Next, we'll use inputs with the -n command line option:

jq -n '
  def group_by_keys: reduce .[] as $o ({}; 
     reduce ($o | to_entries[]) as $kv (.; .[$kv.key] += [$kv.value]));
  [inputs | {(.sdgs[]|tostring): del(.sdgs) }] | group_by_keys

' a11y.json gnu_health.json bahmni.json

(Don't forget -n.)

If the ordering of the keys is important, then simply use this filter:

def sort_by_keys:
  to_entries
  | sort_by(.key|tonumber)
  | from_entries;

Upvotes: 3

Related Questions