Max
Max

Reputation: 93

How to get an array with all images numbers and images letters with regex?

I need to get all concerned images parsing a html in PHP, based on an expression formatted like this:

(fig. 8a-c, 9b-c)

I would like to catch this using a regex in order to output an array such as:

array(
[8] => [a,b,c],
[9] => [b,c])

The expression can be anything like:

(fig. 8)
(fig. 8,9)
(fig. 11a, b)

Here is the regex i have at the moment, but it does not seem to work for every case:

https://regex101.com/r/ShqlnY/3/

Can you help me getting an array containing all included images ? Thanks

Upvotes: 2

Views: 58

Answers (3)

Max
Max

Reputation: 93

Thanks, i ended up with a regular expression like this:

'/(?:\(fig\.\h*|\G(?!^))(\d+)([a-z])?(?:-([a-z])?)?(?:,\h*)?(?=[^)]*\))/m'

used with preg_match_all

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163287

Perhaps for your example data you might use a range and a pattern with 3 capturing groups where the third group is optional.

If the third group does not exists, you return the single value in an array, or else you use the second and the third group to create a range.

(?:^\(fig\.\h*|\G(?!^))(\d+)([a-z])(?:-([a-z])?)?(?:,\h*)?(?=[^)]*\))
  • (?: Non capturing group
    • ^\(fig\.\h* Match start of the string and (fig. followed by 0+ horizonal whitespaces
    • | Or
    • \G(?!^) Assert position at the end of the previous match, not at the start
  • ) Close non capturing group
  • (\d+)([a-z]) Capture 1+ digits in group 1, Capture a-z in group 2
  • (?: Non capturing group
    • -([a-z])?
  • )? Close non capturing group and make optional
  • (?:,\h*)? Match optional , and 0+ horizontal whitespace chars
  • (?=[^)]*\)) Assert what is on the right is a closing parenthesis

Regex demo

For example:

$pattern = "/(?:^\(fig\.\h*|\G(?!^))(\d+)([a-z])(?:-([a-z])?)?(?:,\h*)?(?=[^)]*\))/m";
$str = '(fig. 8a-c, 9b-c)
(fig. 8)
(fig. 8,9)
(fig. 11a, b)';
preg_match_all($pattern, $str, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE, 0);

$matches = array_map(function($x){
    if (isset($x[3][0])) {
        return [
            $x[1][0] => range($x[2][0], $x[3][0]),
            "start" => $x[1][1],
            "end" => $x[3][1]
        ];
    }
    return [
        $x[1][0] => [$x[2][1]],
        "start" => $x[2][0],
        "end" => $x[1][1]
    ];

}, $matches);

print_r($matches);

Result

Array
(
    [0] => Array
        (
            [8] => Array
                (
                    [0] => a
                    [1] => b
                    [2] => c
                )

            [start] => 6
            [end] => 9
        )

    [1] => Array
        (
            [9] => Array
                (
                    [0] => b
                    [1] => c
                )

            [start] => 12
            [end] => 15
        )

)

See a php demo

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may use

'~(?:\G(?!^),\s*|\(fig\.)\s*\K([0-9]{1,3})([a-z]-[a-z])~'

with preg_match_all to get all the char ranges from inside a (fig. ...) substring (see the regex demo), and then use this post-process code:

$rx = "~(?:\G(?!^),\s*|\(fig\.)\s*\K([0-9]{1,3})([a-z]-[a-z])~";
$s = "(fig. 8a-c, 9b-c)";
preg_match_all($rx, $s, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER, 0);

foreach ($matches as $m) {
    $result = [];
    $result[] = $m[0][1]; // Position of the match
    $result[] = $m[1][0]; // The number
    $kv = explode("-", $m[2][0]);
    $result = array_merge($result, buildNumChain($kv));
    print_r($result);
}

function buildNumChain($arr) {
    $ret = [];
    foreach(range($arr[0], $arr[1]) as $letter) {
        $ret[] = $letter;
    }
    return $ret;
}

Output:

Array ( [0] => 6  [1] => 8 [2] => a [3] => b [4] => c )
Array ( [0] => 12 [1] => 9 [2] => b [3] => c )

See the PHP demo.

Regex details

  • (?:\G(?!^),\s*|\(fig\.) - (fig. or end of the previous match + , and 0+ whitespaces
  • \s* - 0+ whitespaces
  • \K - match reset operator
  • ([0-9]{1,3}) - Group 1: 1 to 3 digits
  • ([a-z]-[a-z]) - Group 2: a lowercase letter, - and a lowercase letter.

Upvotes: 0

Related Questions