Homap
Homap

Reputation: 2214

Dividing one file into separate based on line numbers

I have the following test file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

I want to separate it in a way that each file contains the last line of the previous file as the first line. The example would be:

file 1:
1
2
3
4
5
file2: 
5
6
7
8
9
file3:
9
10
11
12
13
file4:
13
14
15
16
17
file5:
17
18
19
20

That would make 4 files with 5 lines and 1 file with 4 lines.

As a first step, I tried to test the following commands I wrote to get only the first file which contains the first 5 lines. I can't figure out why the awk command in the if statement, instead of printing the first 5 lines, it prints the whole 20?

d=$(wc test)
a=$(echo $d | cut -f1 -d " ")
lines=$(echo $a/5 | bc -l)
integer=$(echo $lines | cut -f1 -d ".")
for i in $(seq 1 $integer); do
start=$(echo $i*5 | bc -l)
var=$((var+=1))
echo start $start
echo $var
if [[ $var = 1 ]]; then
    awk 'NR<=$start' test
fi
done

Thanks!

Upvotes: 0

Views: 72

Answers (4)

dawg
dawg

Reputation: 103714

Use split:

$ seq 20 | split -l 5
$ for fn in x*; do echo "$fn"; cat "$fn"; done
xaa
1
2
3
4
5
xab
6
7
8
9
10
xac
11
12
13
14
15
xad
16
17
18
19
20

Or, if you have a file:

$ split -l test_file

Upvotes: 0

ULick
ULick

Reputation: 999

You could improve your code by removing the unneccesary echo cut and bc and do it like this

#!/bin/bash
for i in $(seq $(wc -l < test) ); do
    (( i % 4 != 1 )) && continue
    tail +$i test | head -5 > "file$(( 1+i/4 ))"
done

But still the awk solution is much better. Reading the file only once and taking actions based on readily available information (like the linenumber) is the way to go. In shell you have to count the lines, there is no way around it. awk will give you that (and a lot of other things) for free.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203169

$ ls
$
$ seq 20 | awk 'NR%4==1{ if (out) { print > out; close(out) } out="file"++c } {print > out}'
$
$ ls
file1  file2  file3  file4  file5

.

$ cat file1
1
2
3
4
5
$ cat file2
5
6
7
8
9
$ cat file3
9
10
11
12
13
$ cat file4
13
14
15
16
17
$ cat file5
17
18
19
20

If you're ever tempted to use a shell loop to manipulate text again, make sure to read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice first to understand at least some of the reasons to use awk instead. To learn awk, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

oh. and wrt why your awk command awk 'NR<=$start' test didn't work - awk is not shell, it has no more access to shell variables (or vice-versa) than a C program does. To init an awk variable named awkstart with the value of a shell variable named start and then use that awk variable in your script you'd do awk -v awkstart="$start" 'NR<=awkstart' test. The awk variable can also be named start or anything else sensible - it is completely unrelated to the name of the shell variable.

Upvotes: 2

Inian
Inian

Reputation: 85530

Why not just use the split util available from your POSIX toolkit. It has an option to split on number of lines which you can give it as 5

split -l 5 input-file

From the man split page,

-l, --lines=NUMBER
       put NUMBER lines/records per output file

Note that, -l is POSIX compliant also.

Upvotes: 3

Related Questions