Reputation: 405
I would like to capture each of these in their own group with preg_match_all in PHP:
Keeping in mind that I want to ignore all book titles and the number of items in the string may be dynamic, the regex should work on all the examples below:
This is what I managed to come up with so far:
$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/([a-z]+)(?=\d|\d\s)\s*(\d*)\s*(?<=\d|\d\s)([a-z]?).*?(and|or)?/i', $str, $matches);
Array
(
[0] => Array
(
[0] => Pg3
)
[1] => Array
(
[0] => Pg
)
[2] => Array
(
[0] => 3
)
[3] => Array
(
[0] =>
)
[4] => Array
(
[0] =>
)
)
The expected result should be:
Array
(
[0] => Array
(
[0] => Ch 1 a and
[1] => Sect 2b and
[2] => Pg3
)
[1] => Array
(
[0] => Ch
[1] => Sect
[2] => Pg
)
[2] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[3] => Array
(
[0] => a
[1] => b
[2] =>
)
[4] => Array
(
[0] => and
[1] => and
[2] =>
)
)
Upvotes: 0
Views: 292
Reputation: 7470
This is how I would do it.
$arr = array(
'Ch1 and Sect2b',
'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3',
'Ch 4 x unwantedtitle and Sect 5y unwanted title and' .
' Sect6 z and Ch7 or Ch8a',
'Assume this is ch1a and ch 2 or ch seCt 5c.' .
' Then SECT or chA pg22a and pg 13 andor'
);
foreach ($arr as $a) {
var_dump($a);
preg_match_all(
'~
\b(?P<word>ch|sect|(pg))
\s*(?P<number>\d+)
(?(2)\b|
\s*
(?P<letter>(?!(?<=\s)(?:and|or)\b)[a-z]+)?
\s*
(?:(?<=\s)(?P<cond>and|or)\b)?
)
~xi'
,$a,$m);
foreach ($m as $k => $v) {
if (is_numeric($k) && $k !== 0) unset($m[$k]);
// this is for 'beautifying' the result array
// note that $m[0] will still return whole matches
}
print_r($m);
}
I had to turn pg
into a capturing group because I needed to write a condition explicitly for that, which is, it can be appended a number (with or without spaces in between) but it can not be appended any letters considering a page indicator will not have a letter like in "pg23a".
That's why I chose to name each group and "beautify" the result by the inner foreach loop in the code. Otherwise if you choose to use numeric indexes (instead of named ones) you will need to skip each $m[2]
.
To display an example here's the output of the last item in $arr
.
Array
(
[0] => Array
(
[0] => ch1a and
[1] => ch 2 or
[2] => seCt 5c
[3] => pg 13
)
[word] => Array
(
[0] => ch
[1] => ch
[2] => seCt
[3] => pg
)
[number] => Array
(
[0] => 1
[1] => 2
[2] => 5
[3] => 13
)
[letter] => Array
(
[0] => a
[1] =>
[2] => c
[3] =>
)
[cond] => Array
(
[0] => and
[1] => or
[2] =>
[3] =>
)
)
Upvotes: 0
Reputation: 21325
This is the closest I could get:
$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/((Ch|Sect|Pg)\s?(\d+)\s?(\w?))(.*?(and|or))?/i', $str, $matches);
Array
(
[0] => Array
(
[0] => Ch 1 a unwantedtitle and
[1] => Sect 2b unwanted title and
[2] => Pg3
)
[1] => Array
(
[0] => Ch 1 a
[1] => Sect 2b
[2] => Pg3
)
[2] => Array
(
[0] => Ch
[1] => Sect
[2] => Pg
)
[3] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[4] => Array
(
[0] => a
[1] => b
[2] =>
)
[5] => Array
(
[0] => unwantedtitle and
[1] => unwanted title and
[2] =>
)
[6] => Array
(
[0] => and
[1] => and
[2] =>
)
)
Upvotes: 0