Reputation: 31

How to use sed to pick up a specific paragraph and get rid of repeated ones

I want to pick up the lines begin with '---------read-------' and end with 'finish.' from a log file showing below, at the meantime, get rid of the repeated paragraphs (keep only the last match of the same paragraphs)

-------------read-----------  

File reading...  
1 failed  
finish.

[some unrelated messages]

-------------read-----------  
File reading...  
2 failed  
finish.

[some unrelated messages]

-------------read-----------  
File reading...  
1 failed   
finish.  
[some unrelated messages]

In the log file, the paragraph has fixed begin line and end line, but not fixed middle lines, so I am using
sed -n -e "/-------------read-----------/,/finish./ p" $input_file_name to pick up the paragraph, but cannot remove the repeate ones (some paragraph may duplicate)

I've tried using sed -n "0,/----read---/,/finish/ p" or sed -n "/----read------/,/finish/,{p;q;}", but they are not working.

he ideal output would be:

-------------read-----------  
File reading...  
2 failed  
finish.  
-------------read-----------  
File reading...  
1 failed   
finish.

How can I do that? I'd really appreciate it if someone could help!

Upvotes: 1

Answers (4)

potong

Reputation: 58473

This might work for you (GNU sed):

sed -r '/-+read/,/finish\./H;$!d;x;:a;s/(\n-+read.*finish\.)(.*\1)/\2/;ta;s/.//' file

This stores the filtered lines in the hold space then uses pattern matching and backreferences to remove the repeated paragraphs. It is however a fragile solution as it demands that the repeated paragraphs are exact copies (unlike the example given).

Upvotes: 0

karakfa

Reputation: 67507

using a similar logic

$ awk '/-+read-+/{k=$0; next} 
            k&&NF{sub(/ *$/,""); k=k RS $0}
         /finish/{if(NR==FNR) a[k]++;
                  else if(!--a[k]) print k; 
                  k=""}' log{,}
-------------read-----------
File reading...
2 failed
finish.
-------------read-----------
File reading...
1 failed
finish.

keeping the last matching record creates the additional complexity.

Upvotes: 0

ghoti

Reputation: 46856

I'm not sure where we're supposed to look for the repeats that we don't want to repeat (your sample input doesn't appear to have filenames for example), but you can strip the unnecessary data with a simple toggle:

$ awk '/^-+read-+/ {show=1} show; $1=="finish." {show=0}' inputfile

Upvotes: 0

Ed Morton

Reputation: 203985

$ cat tst.awk
{ gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
!NF { next }
/-------------read-----------/ { inBlock=1; block="" }
inBlock { block = block $0 RS }
/finish/ {
    if (NR==FNR) {
        lastSeen[block] = FNR
    }
    else {
        if (FNR==lastSeen[block]) {
            printf "%s", block
        }
    }
    inBlock=0
}

$ awk -f tst.awk file file
-------------read-----------
File reading...
2 failed
finish.
-------------read-----------
File reading...
1 failed
finish.

Upvotes: 1

How to use sed to pick up a specific paragraph and get rid of repeated ones

Answers (4)

Related Questions