Reputation: 31
I want to pick up the lines begin with '---------read-------' and end with 'finish.' from a log file showing below, at the meantime, get rid of the repeated paragraphs (keep only the last match of the same paragraphs)
-------------read-----------
File reading...
1 failed
finish.
[some unrelated messages]
-------------read-----------
File reading...
2 failed
finish.
[some unrelated messages]
-------------read-----------
File reading...
1 failed
finish.
[some unrelated messages]
In the log file, the paragraph has fixed begin line and end line, but not fixed middle lines, so I am using
sed -n -e "/-------------read-----------/,/finish./ p" $input_file_name
to pick up the paragraph, but cannot remove the repeate ones (some paragraph may duplicate)
I've tried using sed -n "0,/----read---/,/finish/ p"
or sed -n "/----read------/,/finish/,{p;q;}"
, but they are not working.
he ideal output would be:
-------------read-----------
File reading...
2 failed
finish.
-------------read-----------
File reading...
1 failed
finish.
How can I do that? I'd really appreciate it if someone could help!
Upvotes: 1
Views: 170
Reputation: 58473
This might work for you (GNU sed):
sed -r '/-+read/,/finish\./H;$!d;x;:a;s/(\n-+read.*finish\.)(.*\1)/\2/;ta;s/.//' file
This stores the filtered lines in the hold space then uses pattern matching and backreferences to remove the repeated paragraphs. It is however a fragile solution as it demands that the repeated paragraphs are exact copies (unlike the example given).
Upvotes: 0
Reputation: 67507
using a similar logic
$ awk '/-+read-+/{k=$0; next}
k&&NF{sub(/ *$/,""); k=k RS $0}
/finish/{if(NR==FNR) a[k]++;
else if(!--a[k]) print k;
k=""}' log{,}
-------------read-----------
File reading...
2 failed
finish.
-------------read-----------
File reading...
1 failed
finish.
keeping the last matching record creates the additional complexity.
Upvotes: 0
Reputation: 46856
I'm not sure where we're supposed to look for the repeats that we don't want to repeat (your sample input doesn't appear to have filenames for example), but you can strip the unnecessary data with a simple toggle:
$ awk '/^-+read-+/ {show=1} show; $1=="finish." {show=0}' inputfile
Upvotes: 0
Reputation: 203985
$ cat tst.awk
{ gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
!NF { next }
/-------------read-----------/ { inBlock=1; block="" }
inBlock { block = block $0 RS }
/finish/ {
if (NR==FNR) {
lastSeen[block] = FNR
}
else {
if (FNR==lastSeen[block]) {
printf "%s", block
}
}
inBlock=0
}
$ awk -f tst.awk file file
-------------read-----------
File reading...
2 failed
finish.
-------------read-----------
File reading...
1 failed
finish.
Upvotes: 1