Reputation: 741
I am having a dataset that looks like
I(0,123...789){
A(0,567...999){.......n=Marc.....}
B(2,655...265){..................}
C(3,993...333){..................}
M(8,635...254){.................;}
}
O(0,345...789){
A(0,567...999){.......n=Marc.....}
B(2,876...775){..................}
C(3,993...549){..................}
M(8,354...987){.................;}
}
I(0,987...764){
A(0,567...999){.......n=Marc.....}
B(2,543...265){..................}
C(7,998...933){..................}
M(8,645...284){.................;}
}
B(0,123...789){
.......
}
I(0,987...764){
A(0,567...999){.......n=John.....}
B(2,543...265){..................}
C(7,998...933){..................}
M(8,645...284){.................;}
}
I am trying to return all I "sections" so starting from "I" until the closing tag that comes after the ;} but only if the "I" section contains n=Marc.
So far I came with
^([I]\(.*\){.*n=Marc.*^[M]\(.*;}.)}
https://regex101.com/r/VSuZh5/1
However in some cases, when data has a pattern like
I(0,123...789){
A(0,567...999){.......n=Marc.....}
B(2,655...265){..................}
C(3,993...333){..................}
M(8,635...254){.................;}
}
O(0,345...789){
A(0,567...999){.......n=Marc.....}
B(2,876...775){..................}
C(3,993...549){..................}
M(8,354...987){.................;}
}
The regular expression returns both the I and O section. Is there a way to make sure it always return the I section?
Upvotes: 2
Views: 64
Reputation: 18490
If I knew, the input was always be formatted like sample, would rather split into chunks at a closing }
at start of line, followed by a newline if followed by an upper: ^}\R(?=[A-Z])
.
Then find the items starting with I
and containing n=Marc
by use of preg_grep
.
$res = preg_grep('/^I.*n=Marc/s', preg_split('/^}\R(?=[A-Z])/m', $str));
In your pattern the .*
can skip over undesired items resulting in unexpected matches.
Upvotes: 2
Reputation: 163277
One option might be to match I
, then match all the lines that do not start with }
and match at least 1 line that contains n=Marc
^I\([^()]*\){(?:\R(?!}|.*n=Marc).*)*\R.*\bn=Marc\b.*(?:\R(?!}).*)*\R}$
Explanation
^
Start of stringI\([^()]*\){
Match I
followed by (...){
(?:
Non capturing group
\R(?!}|.*n=Marc)
Match unicode newline sequence, assert what is on the right is not }
or that the line contains n=Marc.*
Match any char 0+ times)*
close non capturing group and repeat 0+ times\R
Match unicode newline sequence.*\bn=Marc\b.*
Match any char 0+ times and match n=Marc
between word boundaries(?:
non capturing group
\R(?!}).*
Match newline sequence asserting what is on the right is not }
)*
Close non capturing group and repeat 0+ times\R
Match newline sequence}
Match closing }
$
End of stringUpvotes: 3
Reputation: 27723
My guess is that we want an expression to return the O
section that has n=Marc
in it, something similar to:
(?=O\()([\s\S]*?n=Marc[\s\S]*?;}\s*})
Or maybe:
(?=O\()([\s\S]*?n=Marc[\s\S]*?;})\s*}
For I
sections we'd simply change O
to I
:
(?=I\()([\s\S]*?n=Marc[\s\S]*?;})\s*}
$re = '/(?=I\()([\s\S]*?n=Marc[\s\S]*?;})\s*}/m';
$str = 'I(0,123...789){
A(0,567...999){.......n=Marc.....}
B(2,655...265){..................}
C(3,993...333){..................}
M(8,635...254){.................;}
}
O(0,345...789){
A(0,567...999){.......n=Marc.....}
B(2,876...775){..................}
C(3,993...549){..................}
M(8,354...987){.................;}
}
I(0,987...764){
A(0,567...999){.......n=Marc.....}
B(2,543...265){..................}
C(7,998...933){..................}
M(8,645...284){.................;}
}
B(0,123...789){
.......
}
I(0,987...764){
A(0,567...999){.......n=John.....}
B(2,543...265){..................}
C(7,998...933){..................}
M(8,645...284){.................;}
}';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $key => $I) {
echo $I[0] . "\n";
}
I(0,123...789){
A(0,567...999){.......n=Marc.....}
B(2,655...265){..................}
C(3,993...333){..................}
M(8,635...254){.................;}
}
I(0,987...764){
A(0,567...999){.......n=Marc.....}
B(2,543...265){..................}
C(7,998...933){..................}
M(8,645...284){.................;}
}
Upvotes: 0