Reputation: 13
I have a (strange) string like:
EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR
The pattern I need to look for can only be defined by keywords: EREF+
, MREF+
, CRED+
and others. I know there are 19 keywords, but the string may contain different subsets of these 19 keywords. I don't know if the order stays the same, from what I can tell EREF+
will most likely be the first keyword, but the order may as well differ. I also don't know which of the 19 keywords might be the last one in the string as that may change case by case.
My first approach was to just use explode() twice, with keyword 1 and keyword 2 – but if the keywords change order (and I cannot guarantee they don't) I would have to go through all possible combinations.
Anyway, here's the first (working) code I used:
<?php
$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";
function getBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $start.$r[0];
}
return '';
}
$start = "EREF+";
$end = "MREF+";
$output = getBetween($string,$start,$end);
echo $output;
?>
So now I am looking into regex to come up with a solution that extracts a substring between two keywords, where any of the keywords can be the start delimiter while any other keyword may be the end delimiter.
Since there are literally thousands of regex questions around, I took some time and tried to adapt from other solutions, but no success until now. I must confess regex is voodoo to me and I cannot seem to remember the patterns for more than a minute. I found this thread which is pretty close to what I am trying to achieve, and tried a few tweaks but I cannot get it to work properly.
Here's my code so far:
<?php
$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";
$matches = array();
$keywords = ['EREF+', 'MREF+', 'CRED+', 'SVWZ+', 'ABWA+'];
$pattern = sprintf('/(?:%s):(.*?)/', join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, $keywords)));
preg_match_all($pattern, $string, $matches);
print_r($matches);
?>
... whereas the constructed pattern looks like this:
/(?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+):(.*?)/
Can anyone advise please? Any help appreciated!
Thanks
Upvotes: 1
Views: 278
Reputation: 443
You can use this regex:
/(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+)(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/
It will match the strings between defined keywords.
(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+) # look backward for a keyword
(.+?) #Match any character, non greedy
(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$) # Look forward for a keyword or end of string
Edit: If you want to know what keyword caused the split you can use this regex:
/((?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+))(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/
It will capture the first keyword and the text between keywords.
Upvotes: 1