Reputation: 298
I have a big yaml file:
---
foo: bar
baz:
bacon: true
eggs: false
---
goo: car
star:
cheese: true
water: false
---
dog: boxer
food:
turkey: true
moo: cow
---
...
What i'd like to do is split this file into n-number of valid yaml files.
I attempted doing this with csplit in bash:
But ultimately end up with either a lot more files than I want:
csplit --elide-empty-files -f rendered- example.yaml "/---/" "{*}"
or a split where the last file contains most of the content:
csplit --elide-empty-files -n 3 -f rendered- app.yaml "/---/" "{3}"
This is non-ideal. What I really want is the ability to say, split a yaml file in thirds where it splits on the closest delimiter. I know that won't always be truly thirds.
Any ideas on how to accomplish this in bash?
Upvotes: 2
Views: 4131
Reputation: 4049
I don't think there's a way to do this with csplit. I was able to split it into files of 1000 yaml documents each with awk:
awk '/---/{f="rendered-"int(++i/1000);}{print > f;}' app.yaml
To get exactly three files, you could try something like:
awk '/---/{f="rendered-"(++i%3);}{print > f;}' app.yaml
Upvotes: 2
Reputation: 525
My idea is not a one-liner, but this works.
#!/bin/bash
file=example.yaml
output=output_
count=$(cat ${file} | wc -l)
count=$((count + 1))
lines=$(grep -n -e '---' ${file} | awk -F: '{ print $1 }')
lines="${lines} ${count}"
start=$(echo ${lines} | awk '{ print $1 }')
lines=$(echo ${lines} | sed 's/^[0-9]*//')
for n in ${lines}
do
end=$((n - 1))
sed -n "${start},${end}p" ${file} > "${output}${start}-${end}.yaml"
start=$n
done
Upvotes: 0