abilng
abilng

Reputation: 348

Get n lines from file which are equal spaced

I have a big file with 1000 lines.I wanted to get 110 lines from it. Lines should be evenly spread in Input file.

For example,I have read 4 lines from file with 10 lines

Input File

1
2
3
4
5
6
7
8
9
10

outFile:

1
4
7
10

Upvotes: 4

Views: 131

Answers (4)

dinox0r
dinox0r

Reputation: 16039

Use:

sed -n '1~9p' < file

The -n option will stop sed from outputting anything. '1~9p' tells sed to print from line 1 every 9 lines (the p at the end orders sed to print).

To get closer to 110 lines you have to print every 9th line (1000/110 ~ 9).


Update: This answer will print 112 lines, if you need exactly 110 lines, you can limit the output just using head like this:

sed -n '1~9p' < file | head -n 110

Upvotes: 4

Zac Thompson
Zac Thompson

Reputation: 12675

I often like to use a combination of shell and awk for these sorts of things

#!/bin/bash

filename=$1
toprint=$2

awk -v tot=$(expr $(wc -l < $filename)) -v toprint=$toprint '
BEGIN{ interval=int((tot-1)/(toprint-1)) }

(NR-1)%interval==0 {
    print;
    nbr++
}

nbr==toprint{exit}

' $filename

Some examples:

$./spread.sh 1001lines 5
1
251
501
751
1001
$ ./spread.sh 1000lines 110 |head -n 3
1
10
19
$ ./spread.sh 1000lines 110 |tail -n 3
964
973
982

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203665

$ cat tst.awk
NR==FNR { next }
FNR==1 { mod = int((NR-1)/tgt) }
!( (FNR-1)%mod ) { print; cnt++ }
cnt == tgt { exit }

$ wc -l file1
1000 file1

$ awk -v tgt=110 -f tst.awk file1 file1 > file2

$ wc -l file2
110 file2

$ head -5 file2
1
10
19
28
37

$ tail -5 file2
946
955
964
973
982

Note that this will not produce the output you posted in your question given your posted input file because that would require an algorithm that doesn't always use the same interval between output lines. You could dynamically calculate mod and adjust it as you parse your input file if you like but the above may be good enough.

Upvotes: 3

user000001
user000001

Reputation: 33327

With awk you can do:

 awk -v interval=3 '(NR-1)%interval==0' file

where interval is the difference in line count between consecutive lines that are printed. The value is essentially a division of the total lines in the file divided by the number of lines that are printed.

Upvotes: 2

Related Questions