Reputation: 1
I'm trying to find needed log in a pretty big log file(let's say 250 mb). Every single log starts with
YYYY-MM-DD time:
Next goes some one or multiline text that I want to match
And finally ends with a newline and new DateTime pattern.
The question is how to match the text inside a log if it is multiline and only before the next log. The order of matching values is unknown as well as the line of them.
I have tried next solution
grep -Pzio '^(\d{4}-\d{2}-\d{2} timePattern)(?=[\s\S]*?Value1)(?=[\s\S]*?Value2)(?=[\s\S]*?Value3)[\s\S]*?(?=(\n\1|\Z)' file.log
But it comes to overhead PCRE limit even with ungreedy [\s\S]*?
or simply gets previous unmatched log and includes lots of other logs in [\s\S]*
before it finally finds all three values to match before the first capturing group and just gives me back huge text.
So the only difficulty is multiline I think here. Will appreciate any help!
EDIT 0: I need to find only one log that has all the values that I'm trying to match.
EDIT 1: Example
2018-02-09 03:52:46,347 Activity=SomeAct
@Request=<S:Body><S:RQ><S:Info><S:Key><S:First>Value1</S:First><S:Second>Value2</S:Second></S:Key></S:Info></S:RQ></S:Body>
@Response=<SOAP-ENV:Body><S:RS><S:StatusCode>FAILURE</S:StatusCode></S:RS></SOAP-ENV:Body>
2018-02-09 03:52:51,377 Activity=SomeAct
@Request=<S:Body><S:RQ><S:Info><S:Key><S:First>Value1</S:First><S:Second>Value2</S:Second></S:Key></S:Info></S:RQ></S:Body>
@Response=<SOAP-ENV:Body><S:RS><S:StatusCode>SUCCESSFUL</S:StatusCode></S:RS></SOAP-ENV:Body>
2018-02-09 03:52:52,112 Activity=SomeAct
@Response=<SOAP-ENV:Body><S:RS><S:StatusCode>FAILURE</S:StatusCode></S:RS></SOAP-ENV:Body>
@Request=<S:Body><S:RQ><S:Info><S:Key><S:First>Value1</S:First><S:Second>Value3</S:Second></S:Key></S:Info></S:RQ></S:Body>
I need to get only the record with value1 and value2 in SUCCESFULL status. BUT it is not necessary that response is after request or <first>
goes before <second>
or RS\RQ are only one lines.
Upvotes: 0
Views: 156
Reputation: 189317
It's not really clear what you want to find but a common approach is to use Awk with a custom record separator so that a record can be multiple lines. Or you can collect the records manually:
awk '/^YYYY-MM-DD time: / { if (seen1 && seen2 && seen3) print rec;
seen1 = seen2 = seen3 = 0; rec = "" }
{ rec = (rec ? rec "\n" $0 : $0 }
/Value1/ { seen1++ }
/Value2/ { seen2++ }
/Value3/ { seen3++ }
END { if (seen1 && seen2) print rec; }' file
This collects into rec
the lines we have seen since the previous separator, and when we see a new separator, we print the previous value from rec
before starting over if all the "seen" flags are set, indicating that we have matched all the regexes with the text in the current rec
.
A common omission is forgetting to also do this in the END
block, when we reach the end of the file.
Upvotes: 1