Reputation: 811
I am trying to split a big text files after n number of empty lines. The text file contains exactly one empty line as data separator. Like below:
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem
Lorem
...
I have tried to use csplit
csplit data.txt /^$/ {3}
My expectation is that after 3 empty lines (not consecutive, but after cursor processes 3 empty lines) it split file and continue to do so. But it actualy splits file in each empty line.
My expected files: xx00
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
Lorem ipsum
xx01
Lorem ipsum
Lorem ipsum
Lorem
Lorem
Any suggestion?
Upvotes: 1
Views: 99
Reputation: 2875
removed './xx00'
removed './xx01'
removed './awkprof.out'
{m,g}awk '{
print >> sprintf("xx%0*.f%.*s", __-(_~_),
int(_/__),_<_,_+=!NF) }' FS='^$' __=3
-rw-r--r-- 1 501 75 Jun 8 09:19:10 2022 xx00
-rw-r--r-- 1 501 37 Jun 8 09:19:10 2022 xx01
../../Desktop/testdiremptylines/
1 Lorem ipsum
2 Lorem ipsum
3 Lorem ipsum
4
5 Lorem ipsum
6 Lorem ipsum
7
8 Lorem ipsum
9
xx00
1 Lorem ipsum
2 Lorem ipsum
3
4 Lorem
5 Lorem
xx01
Upvotes: 0
Reputation: 5251
awk is good for this.
Split every n
empty lines, naming files with:
No leading zeroes:
awk -v n=3 '
$0 == "" {++c}
c <= n {print > "xx"f}
c==n {c=0; ++f}'
width
minimum width/zeroes:
awk -v n=3 -v width=2 '
$0 == "" {++c}
c <= n {print > "xx"f}
c==n {c=0; ++f; f = sprintf("%0*d",width,f)}'
To remove the trailing empty line in each file, just change c <= n
to c < n
.
Upvotes: 0
Reputation: 785651
This awk
should also work with an empty RS
:
awk -v n=3 -v RS= '{ORS=RT; print > sprintf("xx%02d", int((NR-1)/n))}' file
Upvotes: 2
Reputation: 29290
With awk
(tested with GNU and BSD awk
):
awk -v max=3 '{print > sprintf("xx%02d", int(n/max))} /^$/ {n += 1}' file
Upvotes: 2