Reputation: 8090
I've got a log file which was combined with stderr which I am trying to clean out. I can isolate and find the stderr "pollution", but am struggling with one minor detail: the removal of a newline
This is the separate stdout which I try to restore:
some message 1234556
more info foo bar
and this is the combined stdout/stderr file that I am trying to get rid of the stderr messages:
some message 1234/some/path ERROR
more info only 1 line though
556
more info foo bar
so this is the text that I am trying to get rid of:
/some/path ERROR
more info only 1 line though
including the newlines, so that the separate stdout is restored.
I call:
# get rid of the line AFTER the stderr start
sed -i".bak" -e '/ERROR/{n;d}' *.log
# get rid of the start of stderr
sed -i".bak" -r 's/\/some\/path.*ERROR//' *.log
Unfortunately, the output is now:
some message 1234
556
more info foo bar
Note, the insertion point of the stderr message could be arbitrary (in the middle of a line or at the beginning, anywhere). The only thing I can assume is that is stderr a two-liner and that it starts with /some/path
and contains an error identifier (ERROR
or something else). Also, there could be multiple subsequent stderr messages such as:
some message 1234/some/path ERROR
more info only 1 line though
/some/path ANOTHER_ERR
more info only 1 line though
556
more info foo bar
which I think doesn't pose too much of a problem (there's only 2 kinds, so I can run multiple different matches (ERROR
and ANOTHER_ERR
)). I also don't care about which tool is used sed
or awk
...
Upvotes: 2
Views: 725
Reputation: 5665
Seems perfect for some basic sed. Just use N
to gulp the next line into the pattern space.
sed '/ERROR/{N;s/\/.*//;N;s/\n//g}' input.log
N
Append the next line to pattern spaceN
Append the next line to pattern spaceThat's not far from the OP's attempts with n
.
To expand that to the later sample, you have branch back to the start to see if the N
commands brought more error strings into the pattern space:
sed -E ':a /(ERROR|ANOTHER_ERR)/{N;s/\/.*//;N;s/\n//g;b a}'
-E
to allow two patterns in parens:a
b a
branch back to :a
whenever an error string in the pattern space is found and dealt with.I prefer to avoid sed -z
. It will read the whole file into the pattern space, so it might not be the best choice if this logfile is long, or if you're piping an active stream to sed.
Upvotes: 2
Reputation: 12438
Another sed solution without the -z
option:
$ sed -E -n '/ERROR/{s@/.*@@;h;n;n;H;n;H;x;s/\n//;p}' input.log
some message 1234556
more info foo bar
Upvotes: 1
Reputation: 203597
With GNU sed for -E and -z:
$ sed -Ez 's:/some/path ERROR\n[^\n]+\n::g' file
some message 1234556
more info foo bar
and if you have multiple errors to handle then just list them or-separated in the regexp:
$ cat file
some message 1234/some/path ERROR
more info only 1 line though
/some/path ANOTHER_ERR
more info only 1 line though
556
more info foo bar
$ sed -Ez 's:/some/path (ERROR|ANOTHER_ERR)\n[^\n]+\n::g' file
some message 1234556
more info foo bar
Alternatively, with GNU awk for multi-char RS:
$ awk -v RS='/some/path ERROR\n[^\n]+\n' -v ORS= '1' file
some message 1234556
more info foo bar
or if you prefer:
$ awk -v RS='^$' -v ORS= '{gsub("/some/path ERROR\n[^\n]+\n","")}1' file
some message 1234556
more info foo bar
Upvotes: 3
Reputation: 85600
You can use the powerful paragraph mode option of perl
. The -00
command-line option that turns paragraph slurp mode on, meaning Perl reads text paragraph by paragraph,
rather than line by line (a paragraph is text between two or more newlines.)
perl -00 -pe 's/\/.*(ERROR|ANOTHER_ERR)\n.*\n//g' file
To add the modification in-place, add the -i
flag, similar to sed
perl -00 -pi -e 's/\/.*(ERROR|ANOTHER_ERR)\n.*\n//g' file
Upvotes: 4