Reputation: 2214
I have the following test file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I want to separate it in a way that each file contains the last line of the previous file as the first line. The example would be:
file 1:
1
2
3
4
5
file2:
5
6
7
8
9
file3:
9
10
11
12
13
file4:
13
14
15
16
17
file5:
17
18
19
20
That would make 4 files with 5 lines and 1 file with 4 lines.
As a first step, I tried to test the following commands I wrote to get only the first file which contains the first 5 lines. I can't figure out why the awk
command in the if
statement, instead of printing the first 5 lines, it prints the whole 20?
d=$(wc test)
a=$(echo $d | cut -f1 -d " ")
lines=$(echo $a/5 | bc -l)
integer=$(echo $lines | cut -f1 -d ".")
for i in $(seq 1 $integer); do
start=$(echo $i*5 | bc -l)
var=$((var+=1))
echo start $start
echo $var
if [[ $var = 1 ]]; then
awk 'NR<=$start' test
fi
done
Thanks!
Upvotes: 0
Views: 72
Reputation: 103714
Use split
:
$ seq 20 | split -l 5
$ for fn in x*; do echo "$fn"; cat "$fn"; done
xaa
1
2
3
4
5
xab
6
7
8
9
10
xac
11
12
13
14
15
xad
16
17
18
19
20
Or, if you have a file:
$ split -l test_file
Upvotes: 0
Reputation: 999
You could improve your code by removing the unneccesary echo
cut
and bc
and do it like this
#!/bin/bash
for i in $(seq $(wc -l < test) ); do
(( i % 4 != 1 )) && continue
tail +$i test | head -5 > "file$(( 1+i/4 ))"
done
But still the awk solution is much better. Reading the file only once and taking actions based on readily available information (like the linenumber) is the way to go. In shell you have to count the lines, there is no way around it. awk
will give you that (and a lot of other things) for free.
Upvotes: 0
Reputation: 203169
$ ls
$
$ seq 20 | awk 'NR%4==1{ if (out) { print > out; close(out) } out="file"++c } {print > out}'
$
$ ls
file1 file2 file3 file4 file5
.
$ cat file1
1
2
3
4
5
$ cat file2
5
6
7
8
9
$ cat file3
9
10
11
12
13
$ cat file4
13
14
15
16
17
$ cat file5
17
18
19
20
If you're ever tempted to use a shell loop to manipulate text again, make sure to read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice first to understand at least some of the reasons to use awk instead. To learn awk, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
oh. and wrt why your awk command awk 'NR<=$start' test
didn't work - awk is not shell, it has no more access to shell variables (or vice-versa) than a C program does. To init an awk variable named awkstart
with the value of a shell variable named start
and then use that awk variable in your script you'd do awk -v awkstart="$start" 'NR<=awkstart' test
. The awk variable can also be named start
or anything else sensible - it is completely unrelated to the name of the shell variable.
Upvotes: 2
Reputation: 85530
Why not just use the split
util available from your POSIX
toolkit. It has an option to split on number of lines which you can give it as 5
split -l 5 input-file
From the man split
page,
-l, --lines=NUMBER
put NUMBER lines/records per output file
Note that, -l
is POSIX
compliant also.
Upvotes: 3