Tom Sebastian
Tom Sebastian

Reputation: 3433

How to find occurrence while patterns are overlapping

Context: It is a log analysis thing. I am creating a regex program to find occurrence of certain requests send to a server from a client. I have the client log file containing these requests along with other logs.

Problem: When a request message is send to server, the client should have 2 log statements like:

sending..
message_type

when the above statements or pattern found we can say one request has been sent.It is combined pattern. Ok

We are expecting the log file content will be like

sending..
message_type
...//other text
sending..
message_type
...//other text
sending..
message_type

From the above log we can say client has sent 3 messages. But in the actual log file somehow, the patterns are overlapping as below(not for all messages, but for some):

sending..(1)
...//other text
sending..(2)
message_type(2)
...//other text
message_type(1)
sending..(3)
message_type(3)

Still 3 requests(i numbered messages to understand). But the pattern is overlapped.i.e before logging first message fully , second message got logged. The above explanation is for understanding. Below is the part of original log:

Original log

Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>

here as per the explanation single request will be identified with its 2 parts:

Send message to server:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>

What I tried

public class LogMatcher {   

    static final String create_session= "Send message to server(.){10,1000}(<\\?xml(.){10,500}type=\"createsession\"(.){1,100}</message>)";



    public static void main(String[] args) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File("D:/dummy.txt"))));//I put the above log in this file
        StringBuilder b = new StringBuilder();
        String line = "";
        while((line = reader.readLine()) != null ){     
            b.append(line);
        }

        findMatch(b,"Send message to server","Send message to server");
        findMatch(b,create_session,"create_session");

    }
    private static int findMatch(StringBuilder b,String pattern, String type) {
        int count =0;
        Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(b.toString());
        while (regexMatcher.find()) {
            count++;
        } 
        System.out.printf("%25s%2d\n",type+": ",count);
        return count;
    }
}

Current Output

Intention is to find out the number of createsession messages sent

Send message to server:  2
        create_session:  1

Expected output

From the log it is clear that 2 messages sent.So out put will be:

 Send message to server:  2
         create_session:  2

You can see the pattern I have tried in my code. Can anyone suggest a pattern to get the desired result?

Note: One can simply say why not use the count Send message to server alone. Because in the log there are many type of messages like login, closesession etc. All of them having the first part as Send message to server.Also they have logged message types alone for some other purpose so we can't relay on any part(meaning only the combination we could relay on)

Upvotes: 2

Views: 82

Answers (1)

Mariano
Mariano

Reputation: 6511

Find occurrence of certain requests send to a server from a client.

"other way" that you can neglect here , that will have like Store in DB : instead of Send message to server and the xml message.

I'd propose a new strategy:

  1. Use only 1 regex to match all alternatives, to parse the the log only once (improving performance in long files).
  2. Match type=\"createsession\" xmls independently.
  3. Also match Store in DB: xmls, but ignore them (don't increment the counter).

We can use the following expression to match the number of messages sent to server.

^(?<toserver>Send message to server:)
  • Notice I'm using a named group, we can later reference as regexMatcher.group("toserver") to increment the counter.

And match the target xmls independently as:

^(?<message><\? *xml\b.{10,500} type *= *\"createsession\")
  • Later referenced as regexMatcher.group("message").
  • We'll use an independent counter.

So, how do we ignore Store in DB: xmls? We can match them, while not creating a capture.

^Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*
  • It matches the literal Store in DB :, followed by
  • \r?\n(?:.*\n)*? as few lines as possible, until
  • <\? *xml\b.* it matches the fist <?xml line

Regex

^(?:Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*|(?<toserver>Send message to server:)|(?<message><\? *xml\b.{10,500} type *= *\"createsession\"))

regex101 demo


Code

static final String create_session = "^(?:Store in DB ?:\\r?\\n(?:.*\\n)*?<\\? *xml\\b.*|(?<toserver>Send message to server:)|(?<message><\\? *xml\\b.{10,500} type *= *\\\"createsession\\\"))";

public static void main (String[] args) throws java.lang.Exception
{
    //for testing purposes
    final String text = "Send message to server:\nCreated post notification log dir\nCreated post notification log dir\nCreated post notification log dir\nSend message to server:\nCreated post notification log dir\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nStore in DB :\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></params></response></message>\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></response></message>";
    System.out.println("INPUT:\n" + text + "\n\nCOUNT:");
    StringBuilder b = new StringBuilder();
    b.append(text);

    findMatch(b,create_session,"create_session");
}

private static int findMatch(StringBuilder b,String pattern, String type) {
    int count =0;  // counter for "Send message to server:"
    int countType=0; // counter for "type=\"createsession\""
    Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
    Matcher regexMatcher = regex.matcher(b.toString());
    while (regexMatcher.find()) {
        if (regexMatcher.group("toserver") != null) {
            count++;
        } else if (regexMatcher.group("message") != null) {
            countType++;
        } else {
            // Ignoring "Store in DB :\n<?xml...."
        }
    } 
    System.out.printf("%25s%2d\n%25s%2d\n", "to server: ", count, type+": ", countType);
    return countType;
}

Output

INPUT:
Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
Store in DB :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>

COUNT:
              to server:  2
         create_session:  2

ideone demo

Upvotes: 1

Related Questions