Reputation: 7067
Node0x7fd34984d728:s1 -> Node0x7fd34984d600:d0;
Node0x7fd34984d850 [shape=record,shape=Mrecord,label="{Register %vreg13|0x7fd34984d850|{<d0>i32}}"];
Node0x7fd34984d978 [shape=record,shape=Mrecord,label="{{<s0>0|<s1>1}|CopyFromReg [ORD=1]|0x7fd34984d978|{<d0>i32|<d1>ch}}"];
Node0x7fd34984d978:s0 -> Node0x7fd3486095f0:d0[color=blue,style=dashed];
Node0x7fd34984d978:s1 -> Node0x7fd34984d850:d0;
Node0x7fd34984daa0 [shape=record,shape=Mrecord,label="{Register %vreg14|0x7fd34984daa0|{<d0>i32}}"];
I'm trying to capture only Nodes with "ORD" keyword, my simple Regex pattern is:
Node.+?label=\"\\{\\{(?<SRC><s[0-9]+?>[a-z0-9]+?)\\}|(?<NAME>.+?)\\[ORD=(?<ORD>[0-9]+?)\\]\\|(?<ID>[A-Za-z0-9]{14})|\\{(?<DEST><d[0-9]+?>[a-z0-9]+?)\\}\\}\"\\];
It's too greedy capturing wrong groups.
The following snippet is captured as one group!
Node0x7fd34984d728:s1 -> Node0x7fd34984d600:d0;
Node0x7fd34984d850 [shape=record,shape=Mrecord,label="{Register %vreg13|0x7fd34984d850|{<d0>i32}}"];
Node0x7fd34984d978 [shape=record,shape=Mrecord,label="{{<s0>0|<s1>1}|CopyFromReg [ORD=1]|0x7fd34984d978|{<d0>i32|<d1>ch}}"];
However it must only capture:
Node0x7fd34984d978 [shape=record,shape=Mrecord,label="{{<s0>0|<s1>1}|CopyFromReg [ORD=1]|0x7fd34984d978|{<d0>i32|<d1>ch}}"];
as it's the only Node has "ORD" keyword before Semicolon
Upvotes: 3
Views: 106
Reputation: 626754
You need to get rid of any lazy and dot matching patterns and replace them with negated character classes. That way, you will prevent "overflowing" between parts of your substrings.
String pattern = "Node[^\\]\\[]*\\[[^\\]\\[]*label=\"\\{\{(?<SRC>[^{}]*)\\}\\|(?<NAME>\\w+)\\s*\\[ORD=(?<ORD>\\d+)\\]\\|(?<ID>[^|]*)\\|\\{(?<DEST>[^{}]*)\\}\\}\"\\];";
See demo
Upvotes: 1
Reputation: 7948
I suggest to not use one monster pattern but two simple patterns to extract what you want
use this pattern first:
^Node.*?label="{(.*\bORD\b.*)}".*?;
to extract "only Node has "ORD" keyword before Semicolon"
{<s0>0|<s1>1}|CopyFromReg [ORD=1]|0x7fd34984d978|{<d0>i32|<d1>ch}
Demo
then use this pattern
({.+?}|[^\|]+(?=\[ORD=\d+\])|[^\|]+)
for your various capturing groups - they are numbered not named though.
Demo
results :
MATCH 1
{<s0>0|<s1>1}
MATCH 2CopyFromReg
MATCH 3[ORD=1]
MATCH 40x7fd34984d978
MATCH 5{<d0>i32|<d1>ch}
Upvotes: 1