nytrook
nytrook

Reputation: 61

removing duplicate lines that occur every certain number of lines using sed or other

I have a huge file and I would like to remove duplicate lines that occur only every 3 lines. Is it possible using sed or any similar command?

My file looks like this:

this is text

1234

1234

this is another text

5678

5678

the second number is a duplicate of the first and I would like to remove the second number (third line) for each 3 lines of the file. The reason why I'm not using less filename | uniq is that numbers might repeat themselves in the file (outside the 3 lines range) and I don't want them removed then.

Thanks

Upvotes: 0

Views: 87

Answers (4)

potong
potong

Reputation: 58578

This might work for you (GNU sed):

sed -r 'n;$!N;s/^([^\n]*)\n\1$/\1/' file

Print the first line of three and delete the third line if it is a duplicate of the second.

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247210

Making assumptions about what your input really is and what you want to output:

awk 'NR%3 == 2 {val=$0} NR%3 == 0 && $0 == val {next} 1' <<END
this is text
1234
1234
this is another text
5678
5678
foo
bar
qux
END
this is text
1234
this is another text
5678
foo
bar
qux

Upvotes: 0

ldav1s
ldav1s

Reputation: 16315

The uniq utility only filters out adjacent lines (does your input really have a blank line between each line?). Otherwise it could be used:

this is text
1234
1234
this is another text
1234
1234

uniq input.txt gives:

this is text
1234
this is another text
1234

Upvotes: 1

Fredrik Pihl
Fredrik Pihl

Reputation: 45670

Does this solve your issue?

$ awk 'NR%3!=0' input
this is text


1234
this is another text


5678

Using sed:

$ sed '0~3d' input
this is text


1234
this is another text


5678

Perl:

$ perl -n -e '$.%3!=0&&print' input
this is text


1234
this is another text


5678

But, then again, I might have missinterpreted the question...

Upvotes: 1

Related Questions