linguist3930
linguist3930

Reputation: 41

Delete lines from a text file except the first and every nth

I have a long text file comprised of numbers, such as:

1
2
9.252
9.252
9.272
1
1
6.11
6.11
6.129

I would like to keep the first line, delete the subsequent three and then keep the next one. I would like to do this process for the whole file. Following that logic, considered the input above, I would like to have the following output:

1
9.272
1
6.129

Upvotes: 2

Views: 213

Answers (6)

potong
potong

Reputation: 58410

This might work for you (GNU sed):

sed '2~5,+2d' file

Starting from line 2, delete the next three lines using modulo 5.

An alternative:

sed -n '1p;5~5,+1p' file

Upvotes: 1

tshiono
tshiono

Reputation: 22012

You can simply say:

awk 'NR%5<2' input.txt

Explanation: Considering the entire pattern repeats every five lines, let's start with applying modulo operation to the line number NR by five. Then we'll see the 1st line of the five-line block yields "1" and the 5th line of the block yields "0". Now they can be separated from other lines by comparing it to two.

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203502

To print the 1st and 5th line of every block of 5 lines (remember that 5%5 = 0):

$ awk '(NR%5) ~ /[10]/' file
1
9.272
1
6.129

If you want to print the 2nd, 3rd, and 4th line of every block of 5 lines instead of the 1st and 5th:

$ awk '(NR%5) ~ /[234]/' file
2
9.252
9.252
1
6.11
6.11

If you wanted to print the 27th and 53rd line of every block of 100:

awk '(NR%100) ~ /^(27|53)$/' file

We couldn't use a bracket expression there as we're now beyond single char numbers.

Upvotes: 2

PesaThe
PesaThe

Reputation: 7499

Using GNU sed (needed for the ~ extension):

sed -n '1~5p;5~5p' file

Upvotes: 5

vdavid
vdavid

Reputation: 2544

Considering your groups are packed as 5 lines, you could use awk with a mod 5 operation.

awk '{i=(NR-1)%5;if(i==0||i==4)print $0}' input.txt

With indentation it looks like this:

{
  i=(NR-1)%5;
  if (i==0||i==4)
    print $0;
}

i=(NR-1)%5 gets the line number and computes the modulo with 5, but since the line numbers start at 1 (instead of 0), you need to subtract 1 to it before computing the modulo.

This leaves you with an integer i that ranges from 0 to 4. You want to print the first line (index 0), skip the next three lines (indexes 1-3) and print the last line (index 4), which is exactly what does if (i==0||i==4) print $0

Alternately you can do the same thing with a shorter (and probably slightly more optimized version):

awk '((NR-1)%5==0||(NR-1)%5==4)' input.txt

This tells awk to do something for every 1st out of 5 lines and every 5th out of 5 lines. Since the "something" is not defined, by default it outputs the current line. If it helps, this is strictly equivalent to:

awk '((NR-1)%5==0||(NR-1)%5==4){print $0}' input.txt

Upvotes: 0

KamilCuk
KamilCuk

Reputation: 141010

Saving your numbers in a "textfile.txt" I can use the following with sed:

sed -n 'p;n;n;n;n;p;' textfile.txt

Sed prints the first line, reads the next 4 and prints the last line.

Or the following using while read in bash:

while read -r firstline && read -r nextone1 && read -r nextone2 && read -r nextone3 && read -r lastone; do 
    printf "%s\n" "$firstline" "$lastone"; 
done < textfile.txt

This just reads 5 lines at a time and prints only the first and 5th lines.

Upvotes: 4

Related Questions