Reputation: 15
Example logs:
2018-01-01 11:30:22 xxx Parsing xxx
2018-01-01 11:30:23 driver queryId=<xxx> Parsing command: select *
from table
limit 10
2018-01-01 11:30:25 Parsing completed
2018-01-01 11:30:28 xxxxxx
2018-01-01 11:30:40 driver queryId=<xxx> Parsing command: select * from table group by column
2018-01-01 11:30:45 Parsing completed
2018-01-01 11:30:51 xxxxxx
2018-01-01 11:30:52 xxx Parsing xxx
2018-01-01 11:30:54 driver queryId=<xxx> Parsing command: select
*
from table
order by column
limit 20
2018-01-01 11:30:56 Parsing completed
2018-01-01 11:30:59 xxxxxx
I want to remove newlines between "Parsing command:" and the "2018" matching pattern and output should contain the words only matching the pattern.
Example parsing:
2018-01-01 11:30:54 driver queryId=<xxx> Parsing command: select
*
from table
order by column
limit 20
2018-01-01 11:30:56 Parsing completed
Output of the above example should be,
select * from table order by column limit 20
Upvotes: 1
Views: 547
Reputation: 1060
sed script: file extractcommand.sed
:
#!/usr/bin/sed -f
/Parsing command:/!{d;b} # delete+continue if 'Parsing command' not found
:a # if found, then start a loop with label (a)
s/.*Parsing command:\s*// # delete that 'Parsing command'
/Parsing completed/{ # if found 'Parsing completed'
s:\n[^\n]*Parsing completed:: # then delete that 'Parsing completed'
s:\n: :g # change all \n to space
s: *: :g # remove all extra spaces (optional)
b # break the loop (and print as default)
} #
N # load another line into buffer
ba # loop to label (a)
Test:
$ ./extractcommand.sed <sample.log
select * from table limit 10
select * from table group by column
select * from table order by column limit 20
Upvotes: 1
Reputation: 203522
Keep it simple. Given your first posted input file, using GNU awk for multi-char RS and RT:
$ awk -F'Parsing command: ' -v RS='[^\n]+Parsing completed' 'RT{gsub(/\s+/," ",$NF); print $NF}' file
select * from table limit 10
select * from table group by column
select * from table order by column limit 20
or with any awk:
$ cat tst.awk
/Parsing completed/ {
gsub(/ +/," ",buf)
sub(/.*Parsing command: /,"",buf)
print buf
buf = ""
}
{ buf = buf " " $0 }
$ awk -f tst.awk file
select * from table limit 10
select * from table group by column
select * from table order by column limit 20
Upvotes: 0
Reputation: 21965
sed
could also be used , though it looks a bit scary :-/
sed -nE '/Parsing command:/{
s/^.*Parsing command://;:l1;N;/Parsing completed[[:blank:]]*$/!bl1;
s/2018-.*Parsing completed[[:blank:]]*$//;
s/\n/ /g;s/^[[:blank:]]*//;s/[[:blank:]]+/ /gp}' logfile
Note the last two substitutions are for some fine-grained formatting and the p
flag with last s
takes care of printing.
Output
select * from table limit 10
select * from table group by column
select * from table order by column limit 20
All good :-)
Recommended reading : sed
branching statements.
Upvotes: 1
Reputation: 92854
Awk
solution:
awk '/Parsing command:/{ f=1; sub(/.*Parsing command: /,""); q=$0; next }
f && /^2018/{ gsub(/[[:space:]]{2,}/, " ", q); print q; f=0 }
NF && f{ q=q" "$0 }' logfile
The output:
select * from table limit 10
select * from table group by column
select * from table order by column limit 20
Upvotes: 1
Reputation: 85767
Here's a pretty short solution using perl instead of sed/awk:
perl -ne 's/\n/ /; print +(s/^.*Parsing command: // .. /^2018/ or next) =~ /E/ ? "\n" : $_' input.log
The idea:
We loop over the input lines (-n
). For each line we execute code (-e ...
):
s/\n/ /
).COND1 .. COND2
condition, which is true for all lines in the range between COND1 and COND2.s/^.*Parsing command: //
, which is true if it managed to remove some prefix of the input line ending with Parsing command:
. This is the beginning of our range./^2018/
, which is true if the input line starts with 2018
. This is the end of our range.... or next
). For the rest of the code we're only considering lines within the range...
is a sequence number. The last line in the range has E0
appended to it. We check for /E/
to exclude the last line of the range (the one starting with 2018
), because we don't want to print it."\n"
), otherwise we print the line (with the final newline converted to space from the first substitution).Upvotes: 1