Reputation: 15

Sed/awk - How to remove newline characters between start pattern and end pattern.

Example logs:

2018-01-01 11:30:22 xxx Parsing xxx
2018-01-01 11:30:23 driver queryId=<xxx> Parsing command: select *
from table 
limit 10
2018-01-01 11:30:25 Parsing completed 
2018-01-01 11:30:28 xxxxxx
2018-01-01 11:30:40 driver queryId=<xxx> Parsing command: select * from table group by column
2018-01-01 11:30:45 Parsing completed 
2018-01-01 11:30:51 xxxxxx
2018-01-01 11:30:52 xxx Parsing xxx
2018-01-01 11:30:54 driver queryId=<xxx> Parsing command: select 

*
from table 

order by column

limit 20
2018-01-01 11:30:56 Parsing completed 
2018-01-01 11:30:59 xxxxxx

I want to remove newlines between "Parsing command:" and the "2018" matching pattern and output should contain the words only matching the pattern.

Example parsing:

2018-01-01 11:30:54 driver queryId=<xxx> Parsing command: select 

*
from table 

order by column

limit 20
2018-01-01 11:30:56 Parsing completed

Output of the above example should be,

select * from table order by column limit 20

Upvotes: 1

Answers (5)

Bach Lien

Reputation: 1060

sed script: file extractcommand.sed:

#!/usr/bin/sed -f
/Parsing command:/!{d;b}          # delete+continue if 'Parsing command' not found
:a                                # if found, then start a loop with label (a)
  s/.*Parsing command:\s*//       #   delete that 'Parsing command'
  /Parsing completed/{            #   if found 'Parsing completed'
    s:\n[^\n]*Parsing completed:: #     then delete that 'Parsing completed'
    s:\n: :g                      #     change all \n to space
    s:  *: :g                     #     remove all extra spaces (optional)
    b                             #     break the loop (and print as default)
  }                               #
  N                               #   load another line into buffer
  ba                              #   loop to label (a)

Test:

$ ./extractcommand.sed <sample.log 
select * from table limit 10 
select * from table group by column 
select * from table order by column limit 20

Upvotes: 1

Ed Morton

Reputation: 203522

Keep it simple. Given your first posted input file, using GNU awk for multi-char RS and RT:

$ awk -F'Parsing command: ' -v RS='[^\n]+Parsing completed' 'RT{gsub(/\s+/," ",$NF); print $NF}' file
select * from table limit 10
select * from table group by column
select * from table order by column limit 20

or with any awk:

$ cat tst.awk
/Parsing completed/ {
    gsub(/ +/," ",buf)
    sub(/.*Parsing command: /,"",buf)
    print buf
    buf = ""
}
{ buf = buf " " $0 }

$ awk -f tst.awk file
select * from table limit 10
select * from table group by column
select * from table order by column limit 20

Upvotes: 0

sjsam

Reputation: 21965

sed could also be used , though it looks a bit scary :-/

sed -nE '/Parsing command:/{
s/^.*Parsing command://;:l1;N;/Parsing completed[[:blank:]]*$/!bl1;
s/2018-.*Parsing completed[[:blank:]]*$//;
s/\n/ /g;s/^[[:blank:]]*//;s/[[:blank:]]+/ /gp}' logfile

^{Note the last two substitutions are for some fine-grained formatting and the p flag with last s takes care of printing.}

Output

select * from table limit 10 
select * from table group by column 
select * from table order by column limit 20

All good :-)

^{Recommended reading : sed branching statements.}

Upvotes: 1

RomanPerekhrest

Reputation: 92854

Awk solution:

awk '/Parsing command:/{ f=1; sub(/.*Parsing command: /,""); q=$0; next }
     f && /^2018/{ gsub(/[[:space:]]{2,}/, " ", q); print q; f=0 }
     NF && f{ q=q" "$0 }' logfile

The output:

select * from table limit 10
select * from table group by column
select * from table order by column limit 20

Upvotes: 1

melpomene

Reputation: 85767

Here's a pretty short solution using perl instead of sed/awk:

perl -ne 's/\n/ /; print +(s/^.*Parsing command: // .. /^2018/ or next) =~ /E/ ? "\n" : $_' input.log

The idea:

We loop over the input lines (-n). For each line we execute code (-e ...):

First we replace the newline by a space (s/\n/ /).
Then we check a COND1 .. COND2 condition, which is true for all lines in the range between COND1 and COND2.
Our first condition is the substitution s/^.*Parsing command: //, which is true if it managed to remove some prefix of the input line ending with Parsing command:. This is the beginning of our range.
Our second condition is the match /^2018/, which is true if the input line starts with 2018. This is the end of our range.
If this check fails, we just skip to the next input line (... or next). For the rest of the code we're only considering lines within the range.
The value returned by .. is a sequence number. The last line in the range has E0 appended to it. We check for /E/ to exclude the last line of the range (the one starting with 2018), because we don't want to print it.
If we're at the last line, we just output a newline ("\n"), otherwise we print the line (with the final newline converted to space from the first substitution).

Upvotes: 1

Sed/awk - How to remove newline characters between start pattern and end pattern.

Answers (5)

Related Questions