Reputation: 348
I have a big file with 1000 lines.I wanted to get 110 lines from it. Lines should be evenly spread in Input file.
For example,I have read 4 lines from file with 10 lines
Input File
1 2 3 4 5 6 7 8 9 10
outFile:
1 4 7 10
Upvotes: 4
Views: 131
Reputation: 16039
Use:
sed -n '1~9p' < file
The -n
option will stop sed
from outputting anything. '1~9p'
tells sed
to print from line 1 every 9 lines (the p
at the end orders sed
to print).
To get closer to 110 lines you have to print every 9th line (1000/110 ~ 9).
Update: This answer will print 112 lines, if you need exactly 110 lines, you can limit the output just using head
like this:
sed -n '1~9p' < file | head -n 110
Upvotes: 4
Reputation: 12675
I often like to use a combination of shell and awk for these sorts of things
#!/bin/bash
filename=$1
toprint=$2
awk -v tot=$(expr $(wc -l < $filename)) -v toprint=$toprint '
BEGIN{ interval=int((tot-1)/(toprint-1)) }
(NR-1)%interval==0 {
print;
nbr++
}
nbr==toprint{exit}
' $filename
Some examples:
$./spread.sh 1001lines 5
1
251
501
751
1001
$ ./spread.sh 1000lines 110 |head -n 3
1
10
19
$ ./spread.sh 1000lines 110 |tail -n 3
964
973
982
Upvotes: 2
Reputation: 203665
$ cat tst.awk
NR==FNR { next }
FNR==1 { mod = int((NR-1)/tgt) }
!( (FNR-1)%mod ) { print; cnt++ }
cnt == tgt { exit }
$ wc -l file1
1000 file1
$ awk -v tgt=110 -f tst.awk file1 file1 > file2
$ wc -l file2
110 file2
$ head -5 file2
1
10
19
28
37
$ tail -5 file2
946
955
964
973
982
Note that this will not produce the output you posted in your question given your posted input file because that would require an algorithm that doesn't always use the same interval between output lines. You could dynamically calculate mod
and adjust it as you parse your input file if you like but the above may be good enough.
Upvotes: 3
Reputation: 33327
With awk you can do:
awk -v interval=3 '(NR-1)%interval==0' file
where interval is the difference in line count between consecutive lines that are printed. The value is essentially a division of the total lines in the file divided by the number of lines that are printed.
Upvotes: 2