orange
orange

Reputation: 8090

sed or awk to remove pattern including newline

I've got a log file which was combined with stderr which I am trying to clean out. I can isolate and find the stderr "pollution", but am struggling with one minor detail: the removal of a newline

This is the separate stdout which I try to restore:

some message 1234556
more info foo bar

and this is the combined stdout/stderr file that I am trying to get rid of the stderr messages:

some message 1234/some/path ERROR
  more info only 1 line though
556
more info foo bar

so this is the text that I am trying to get rid of:

/some/path ERROR
more info only 1 line though

including the newlines, so that the separate stdout is restored.

I call:

# get rid of the line AFTER the stderr start
sed -i".bak" -e '/ERROR/{n;d}' *.log

# get rid of the start of stderr
sed -i".bak" -r 's/\/some\/path.*ERROR//' *.log

Unfortunately, the output is now:

some message 1234
556
more info foo bar

Note, the insertion point of the stderr message could be arbitrary (in the middle of a line or at the beginning, anywhere). The only thing I can assume is that is stderr a two-liner and that it starts with /some/path and contains an error identifier (ERROR or something else). Also, there could be multiple subsequent stderr messages such as:

some message 1234/some/path ERROR
  more info only 1 line though
/some/path ANOTHER_ERR
  more info only 1 line though
556
more info foo bar

which I think doesn't pose too much of a problem (there's only 2 kinds, so I can run multiple different matches (ERROR and ANOTHER_ERR)). I also don't care about which tool is used sed or awk...

Upvotes: 2

Views: 725

Answers (4)

stevesliva
stevesliva

Reputation: 5665

Seems perfect for some basic sed. Just use N to gulp the next line into the pattern space.

sed '/ERROR/{N;s/\/.*//;N;s/\n//g}' input.log

  • N Append the next line to pattern space
  • Delete everything after the forward slash (includes the nextline)
  • N Append the next line to pattern space
  • Delete all linebreaks

That's not far from the OP's attempts with n.

To expand that to the later sample, you have branch back to the start to see if the N commands brought more error strings into the pattern space:

sed -E ':a /(ERROR|ANOTHER_ERR)/{N;s/\/.*//;N;s/\n//g;b a}'

  • Use -E to allow two patterns in parens
  • Add a label :a
  • b a branch back to :a whenever an error string in the pattern space is found and dealt with.

I prefer to avoid sed -z. It will read the whole file into the pattern space, so it might not be the best choice if this logfile is long, or if you're piping an active stream to sed.

Upvotes: 2

Allan
Allan

Reputation: 12438

Another sed solution without the -z option:

$ sed -E -n '/ERROR/{s@/.*@@;h;n;n;H;n;H;x;s/\n//;p}' input.log
some message 1234556
more info foo bar

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203597

With GNU sed for -E and -z:

$ sed -Ez 's:/some/path ERROR\n[^\n]+\n::g' file
some message 1234556
more info foo bar

and if you have multiple errors to handle then just list them or-separated in the regexp:

$ cat file
some message 1234/some/path ERROR
  more info only 1 line though
/some/path ANOTHER_ERR
  more info only 1 line though
556
more info foo bar

$ sed -Ez 's:/some/path (ERROR|ANOTHER_ERR)\n[^\n]+\n::g' file
some message 1234556
more info foo bar

Alternatively, with GNU awk for multi-char RS:

$ awk -v RS='/some/path ERROR\n[^\n]+\n' -v ORS= '1' file
some message 1234556
more info foo bar

or if you prefer:

$ awk -v RS='^$' -v ORS= '{gsub("/some/path ERROR\n[^\n]+\n","")}1' file
some message 1234556
more info foo bar

Upvotes: 3

Inian
Inian

Reputation: 85600

You can use the powerful paragraph mode option of perl. The -00 command-line option that turns paragraph slurp mode on, meaning Perl reads text paragraph by paragraph, rather than line by line (a paragraph is text between two or more newlines.)

perl -00 -pe 's/\/.*(ERROR|ANOTHER_ERR)\n.*\n//g' file

To add the modification in-place, add the -i flag, similar to sed

perl -00 -pi -e 's/\/.*(ERROR|ANOTHER_ERR)\n.*\n//g' file

Upvotes: 4

Related Questions