Express Cars Inc
Express Cars Inc

Reputation: 15

Sed/awk - How to remove newline characters between start pattern and end pattern.

Example logs:

2018-01-01 11:30:22 xxx Parsing xxx
2018-01-01 11:30:23 driver queryId=<xxx> Parsing command: select *
from table 
limit 10
2018-01-01 11:30:25 Parsing completed 
2018-01-01 11:30:28 xxxxxx
2018-01-01 11:30:40 driver queryId=<xxx> Parsing command: select * from table group by column
2018-01-01 11:30:45 Parsing completed 
2018-01-01 11:30:51 xxxxxx
2018-01-01 11:30:52 xxx Parsing xxx
2018-01-01 11:30:54 driver queryId=<xxx> Parsing command: select 

*
from table 

order by column

limit 20
2018-01-01 11:30:56 Parsing completed 
2018-01-01 11:30:59 xxxxxx

I want to remove newlines between "Parsing command:" and the "2018" matching pattern and output should contain the words only matching the pattern.

Example parsing:

2018-01-01 11:30:54 driver queryId=<xxx> Parsing command: select 

*
from table 

order by column

limit 20
2018-01-01 11:30:56 Parsing completed

Output of the above example should be,

select * from table order by column limit 20

Upvotes: 1

Views: 547

Answers (5)

Bach Lien
Bach Lien

Reputation: 1060

sed script: file extractcommand.sed:

#!/usr/bin/sed -f
/Parsing command:/!{d;b}          # delete+continue if 'Parsing command' not found
:a                                # if found, then start a loop with label (a)
  s/.*Parsing command:\s*//       #   delete that 'Parsing command'
  /Parsing completed/{            #   if found 'Parsing completed'
    s:\n[^\n]*Parsing completed:: #     then delete that 'Parsing completed'
    s:\n: :g                      #     change all \n to space
    s:  *: :g                     #     remove all extra spaces (optional)
    b                             #     break the loop (and print as default)
  }                               #
  N                               #   load another line into buffer
  ba                              #   loop to label (a)

Test:

$ ./extractcommand.sed <sample.log 
select * from table limit 10 
select * from table group by column 
select * from table order by column limit 20

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203522

Keep it simple. Given your first posted input file, using GNU awk for multi-char RS and RT:

$ awk -F'Parsing command: ' -v RS='[^\n]+Parsing completed' 'RT{gsub(/\s+/," ",$NF); print $NF}' file
select * from table limit 10
select * from table group by column
select * from table order by column limit 20

or with any awk:

$ cat tst.awk
/Parsing completed/ {
    gsub(/ +/," ",buf)
    sub(/.*Parsing command: /,"",buf)
    print buf
    buf = ""
}
{ buf = buf " " $0 }

$ awk -f tst.awk file
select * from table limit 10
select * from table group by column
select * from table order by column limit 20

Upvotes: 0

sjsam
sjsam

Reputation: 21965

sed could also be used , though it looks a bit scary :-/

sed -nE '/Parsing command:/{
s/^.*Parsing command://;:l1;N;/Parsing completed[[:blank:]]*$/!bl1;
s/2018-.*Parsing completed[[:blank:]]*$//;
s/\n/ /g;s/^[[:blank:]]*//;s/[[:blank:]]+/ /gp}' logfile

Note the last two substitutions are for some fine-grained formatting and the p flag with last s takes care of printing.


Output

select * from table limit 10 
select * from table group by column 
select * from table order by column limit 20 

All good :-)


Recommended reading : sed branching statements.

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Awk solution:

awk '/Parsing command:/{ f=1; sub(/.*Parsing command: /,""); q=$0; next }
     f && /^2018/{ gsub(/[[:space:]]{2,}/, " ", q); print q; f=0 }
     NF && f{ q=q" "$0 }' logfile

The output:

select * from table limit 10
select * from table group by column
select * from table order by column limit 20

Upvotes: 1

melpomene
melpomene

Reputation: 85767

Here's a pretty short solution using perl instead of sed/awk:

perl -ne 's/\n/ /; print +(s/^.*Parsing command: // .. /^2018/ or next) =~ /E/ ? "\n" : $_' input.log

The idea:

We loop over the input lines (-n). For each line we execute code (-e ...):

  • First we replace the newline by a space (s/\n/ /).
  • Then we check a COND1 .. COND2 condition, which is true for all lines in the range between COND1 and COND2.
  • Our first condition is the substitution s/^.*Parsing command: //, which is true if it managed to remove some prefix of the input line ending with Parsing command:. This is the beginning of our range.
  • Our second condition is the match /^2018/, which is true if the input line starts with 2018. This is the end of our range.
  • If this check fails, we just skip to the next input line (... or next). For the rest of the code we're only considering lines within the range.
  • The value returned by .. is a sequence number. The last line in the range has E0 appended to it. We check for /E/ to exclude the last line of the range (the one starting with 2018), because we don't want to print it.
  • If we're at the last line, we just output a newline ("\n"), otherwise we print the line (with the final newline converted to space from the first substitution).

Upvotes: 1

Related Questions