Reputation: 61
I have a huge file and I would like to remove duplicate lines that occur only every 3 lines. Is it possible using sed or any similar command?
My file looks like this:
this is text
1234
1234
this is another text
5678
5678
the second number is a duplicate of the first and I would like to remove the second number (third line) for each 3 lines of the file. The reason why I'm not using less filename | uniq is that numbers might repeat themselves in the file (outside the 3 lines range) and I don't want them removed then.
Thanks
Upvotes: 0
Views: 87
Reputation: 58578
This might work for you (GNU sed):
sed -r 'n;$!N;s/^([^\n]*)\n\1$/\1/' file
Print the first line of three and delete the third line if it is a duplicate of the second.
Upvotes: 1
Reputation: 247210
Making assumptions about what your input really is and what you want to output:
awk 'NR%3 == 2 {val=$0} NR%3 == 0 && $0 == val {next} 1' <<END
this is text
1234
1234
this is another text
5678
5678
foo
bar
qux
END
this is text
1234
this is another text
5678
foo
bar
qux
Upvotes: 0
Reputation: 16315
The uniq
utility only filters out adjacent lines (does your input really have a blank line between each line?). Otherwise it could be used:
this is text
1234
1234
this is another text
1234
1234
uniq input.txt
gives:
this is text
1234
this is another text
1234
Upvotes: 1
Reputation: 45670
Does this solve your issue?
$ awk 'NR%3!=0' input
this is text
1234
this is another text
5678
Using sed:
$ sed '0~3d' input
this is text
1234
this is another text
5678
Perl:
$ perl -n -e '$.%3!=0&&print' input
this is text
1234
this is another text
5678
But, then again, I might have missinterpreted the question...
Upvotes: 1