Fbr
Fbr

Reputation: 5

PHP - Regex optimization split string in parts

In PHP I try to make a regex to split a string in different parts as array elements.

For example this are my strings :

$string1 = "For a serving of 100 g Sugars: 2.3 g (Approximately)";
$string2 = "For a serving of 100 g Saturated Fat: 5.8 g (Approximately)";
$string3 = "For a portion of 100 g Energy Value: 290 kcal (Approximately)";

And I want to extract specific informations from these strings :

$arrayString1 = array('100 g','Sugars', '2.3 g');
$arrayString2 = array('100 g','Saturated Fat', '5.8 g');
$arrayString3 = array('100 g','Energy Value', '290 kcal');

I made this regex :

(^For a serving of )([\d g]*)([^:]*)(: )([\d.\d]*)( )([a-z]*)

Do you have any idea how to optimize this regex?

Thanks

Upvotes: 0

Views: 46

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

You could make it a bit more specific matching the g or kcal and the digits.

To match all examples, you can use an alternation to match either of the alternatives (?:serving|portion)

Instead of using 7 capturing groups, you can use 3 capturing groups.

You can omit the first capturing group (^For a serving of )and combine the values of the digits and the unit.

^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)? (?:g|kcal))\b
  • ^ Start of string
  • For\h+a\h+(?:serving|portion)\h+of\h+ Match the beginning of the string with either serving or portion
  • (\d+\h+g)\h+ Capture group 1, match 1+ digits and g
  • ([^:\r\n]+):\h+ Capture group 2, match 1+ times any char except :, followed by matching : and 1+ horizontal whitspace chars
  • ( Capture group 3
    • \d+(?:\.\d+)? Match 1+ digits with an optional decimal part
    • \h+(?:g|kcal) Match 1+ horizontal whitespace chars and either g or kcal
  • )\b Close group 3 and a word boundary to prevent the word being part of a longer word

Regex demo | Php demo

For example

$pattern = "~^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)?\h+(?:g|kcal))\b~";
$strings = [
    "For a serving of 100 g Sugars: 2.3 g (Approximately)",
    "For a serving of 100 g Saturated Fat: 5.8 g (Approximately)",
    "For a portion of 100 g Energy Value: 290 kcal (Approximately)"
];

foreach ($strings as $string) {
    preg_match($pattern, $string, $matches);
    array_shift($matches);
    print_r($matches);
}

Output

Array
(
    [0] => 100 g
    [1] => Sugars
    [2] => 2.3 g
)
Array
(
    [0] => 100 g
    [1] => Saturated Fat
    [2] => 5.8 g
)
Array
(
    [0] => 100 g
    [1] => Energy Value
    [2] => 290 kcal
)

Upvotes: 2

Related Questions