merge two yaml files so list has extra values if necessary

Question

In my shell script, there is a part where a command uses a yaml file, and I want this yaml file to be slightly different depending on the value of a variable that is accessible within the shell script.

The yaml file is as follows:

name: John
friends:
  - friendName: Bob
    family:
       extended:
          cousins:
             - Sarah
             - Jane
             - Tom
age: 12

However, depending on the value of the aforementioned variable, I might also want the yaml file to be as follows (i.e. adding more values to the cousins list):

name: John
friends:
  - friendName: Bob
    family:
       extended:
          cousins:
             - Sarah
             - Jane
             - Tom
             - Arthur
             - Michael
             - Frank
age: 12

I understand that yaml files do not support conditional statements, and ideally I'd like to not repeat content in the yaml file. Is there a way to merge two yaml files such that if the same list is present in both files, they are combined (and thus achieves my desired goal of adding more values to the cousins list?

flyx · Accepted Answer

Multiply can do it with the variant *+, which concatenates sequences:

yq eval-all 'select(fileIndex == 0) *+ select(fileIndex == 1)' file1.yaml file2.yaml

given the inputs:

# file1.yaml
---
- one
- two
- three
...
# file2.yaml
---
- three
- four
- five

You will get

- one
- two
- three
- three
- four
- five

Note how concatenate means that any value present in both lists gets duplicated. There is nothing you can do about it, that's just how lists work. If you don't want to duplicate items, you have to use sets instead, which have that semantic. In YAML, you define sets by giving mappings with no values:

# file1.yaml
---
? one
? two
? three
...
# file2.yaml
---
? three
? four
? five

Applying the same operation on those will give you

one:
two:
three:
four:
five:

(yeah, yq isn't able to keep the ? syntax here which is nicer for pure sets, but the semantics are the same)

But what if the final YAML has to contain lists? Well, we need sets initially to do the merging, but we can transform those into sequences afterwards, e.g. these inputs:

# file1.yaml
---
some:
  list:
    ? one
    ? two
    ? three
...
# file2.yaml
---
some:
  list:
    ? three
    ? four
    ? five

processed with this command:

yq eval-all 'select(fileIndex == 0) *+ select(fileIndex == 1) | .some.list |= keys' file1.yaml file2.yaml

will give you:

some:
  list:
    - one
    - two
    - three
    - four
    - five

But since we can modify the data on the fly, we can also make it so that we can input sequences, which will be converted to sets, combined, and then re-written as sequences. With this input

# file1.yaml
---
some:
  list:
    - one
    - two
    - three
...
# file2.yaml
---
some:
  list:
    - three
    - four
    - five

and this command

yq eval-all '
  (select(fileIndex == 0) | .some.list |= ((.[] | {.: ""}) as $item
    ireduce ({}; . * $item)))
*+
  (select(fileIndex == 1) | .some.list |= ((.[] | {.: ""}) as $item
    ireduce ({}; . * $item)))
| .some.list |= keys' file1.yaml file2.yaml

you get

some:
  list:
    - one
    - two
    - three
    - four
    - five

Now the question is: Can you justify this level of complexity for your task?

merge two yaml files so list has extra values if necessary

Answers (1)

Related Questions