mootpt
mootpt

Reputation: 298

Split massive yaml file into N valid yaml files

I have a big yaml file:

---
foo: bar
baz:
  bacon: true
  eggs: false
---
goo: car
star:
  cheese: true
  water: false
---
dog: boxer
food:
  turkey: true
  moo: cow
---
...

What i'd like to do is split this file into n-number of valid yaml files.

I attempted doing this with csplit in bash:

But ultimately end up with either a lot more files than I want: csplit --elide-empty-files -f rendered- example.yaml "/---/" "{*}"

or a split where the last file contains most of the content: csplit --elide-empty-files -n 3 -f rendered- app.yaml "/---/" "{3}"

This is non-ideal. What I really want is the ability to say, split a yaml file in thirds where it splits on the closest delimiter. I know that won't always be truly thirds.

Any ideas on how to accomplish this in bash?

Upvotes: 2

Views: 4131

Answers (2)

Neil
Neil

Reputation: 4049

I don't think there's a way to do this with csplit. I was able to split it into files of 1000 yaml documents each with awk:

awk '/---/{f="rendered-"int(++i/1000);}{print > f;}' app.yaml

To get exactly three files, you could try something like:

awk '/---/{f="rendered-"(++i%3);}{print > f;}' app.yaml

Upvotes: 2

Yuji
Yuji

Reputation: 525

My idea is not a one-liner, but this works.

#!/bin/bash
file=example.yaml
output=output_
count=$(cat ${file} | wc -l)
count=$((count + 1))
lines=$(grep -n -e '---' ${file} | awk -F: '{ print $1 }')
lines="${lines} ${count}"
start=$(echo ${lines} | awk '{ print $1 }')
lines=$(echo ${lines} | sed 's/^[0-9]*//')

for n in ${lines}
do
    end=$((n - 1))
    sed -n "${start},${end}p" ${file} > "${output}${start}-${end}.yaml"         
    start=$n
done

Upvotes: 0

Related Questions