programings
programings

Reputation: 283

Replace duplicate lines with string by matching them with regular expression

I have a number of lines that i get from a command output. They follow this pattern:

payload
constant value(u) constant(u)
payload
constant value(u) constant(u)
payload

In this example, (u) is an unknown character/characters.

What i care about is "payload", so i remove the "constant value(u) constant(u)" lines (by keeping every second line) using sed:

sed -n '1~2!p'

Sometimes, however, there is a duplicate "constant value(u) constant(u)" line and that makes sed to return all the following "constant value(u) constant(u)" lines instead of the "payload" lines .

I can use a regular expression to remove all "constant value(u) constant(u)" lines:

sed '/^constant.*constant.*$/d'

But the problem is that i must have a notion that this line was there, even if it's not a "payload" line, so i want to replace the content of this problematic duplicate line with some string. I want to replace only the "problematic" duplicate lines.

So, here is an example input in normal sutiation:

after 1 hour
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
after 2 hours
Cras id consequat nisl.
after 2 hours
Etiam non metus eu velit maximus dapibus.
after 1 hour
Etiam a mi quis ante congue posuere.
after 5 hours
Suspendisse et venenatis ipsum, aliquet pharetra tortor.

This is a "problematic" input:

after 1 hour
after 6 hours
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
after 2 hours
Cras id consequat nisl.
after 2 hours
Etiam non metus eu velit maximus dapibus.
after 1 hour
Etiam a mi quis ante congue posuere.
after 5 hours
Suspendisse et venenatis ipsum, aliquet pharetra tortor.

The desired output (in case of the problematic input above) is:

(no information)
after 6 hours
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
after 2 hours
Cras id consequat nisl.
after 2 hours
Etiam non metus eu velit maximus dapibus.
after 1 hour
Etiam a mi quis ante congue posuere.
after 5 hours
Suspendisse et venenatis ipsum, aliquet pharetra tortor.

How to approach this in the most efficient way? I guess i should match the "problematic" lines with regular expression and replace them with the desired string, but how?

Upvotes: 0

Views: 338

Answers (2)

Super-intelligent Shade
Super-intelligent Shade

Reputation: 6449

This command will find 2 consecutive lines starting with constant and replace the 2nd one with X:

sed '/^constant.*$/ { N; s/\(^constant.*\n\)constant.*$/\1X/; }'

UPDATE

Based on the additional information you've provided, this should do the trick:

sed '/^after .*$/ { N; s/^after .*\(\nafter .*\)$/(no information)\1/; }'

UPDATE #2

Another solution provided by @potong in the comments:

sed -E '/^after/{N;s/.*(\nafter)/(no information)\1/;P;D}'

This will also work in cases where there are more than 2 "problematic" lines in a row and will replace all of them with (no information).

Upvotes: 2

Kyle Banerjee
Kyle Banerjee

Reputation: 2794

Are the duplicate lines next to each other? If so, just run the file through uniq first

Upvotes: 0

Related Questions