Reputation: 5
In PHP I try to make a regex to split a string in different parts as array elements.
For example this are my strings :
$string1 = "For a serving of 100 g Sugars: 2.3 g (Approximately)";
$string2 = "For a serving of 100 g Saturated Fat: 5.8 g (Approximately)";
$string3 = "For a portion of 100 g Energy Value: 290 kcal (Approximately)";
And I want to extract specific informations from these strings :
$arrayString1 = array('100 g','Sugars', '2.3 g');
$arrayString2 = array('100 g','Saturated Fat', '5.8 g');
$arrayString3 = array('100 g','Energy Value', '290 kcal');
I made this regex :
(^For a serving of )([\d g]*)([^:]*)(: )([\d.\d]*)( )([a-z]*)
Do you have any idea how to optimize this regex?
Thanks
Upvotes: 0
Views: 46
Reputation: 163362
You could make it a bit more specific matching the g or kcal and the digits.
To match all examples, you can use an alternation to match either of the alternatives (?:serving|portion)
Instead of using 7 capturing groups, you can use 3 capturing groups.
You can omit the first capturing group (^For a serving of )
and combine the values of the digits and the unit.
^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)? (?:g|kcal))\b
^
Start of stringFor\h+a\h+(?:serving|portion)\h+of\h+
Match the beginning of the string with either serving
or portion
(\d+\h+g)\h+
Capture group 1, match 1+ digits and g
([^:\r\n]+):\h+
Capture group 2, match 1+ times any char except :
, followed by matching :
and 1+ horizontal whitspace chars(
Capture group 3
\d+(?:\.\d+)?
Match 1+ digits with an optional decimal part\h+(?:g|kcal)
Match 1+ horizontal whitespace chars and either g
or kcal
)\b
Close group 3 and a word boundary to prevent the word being part of a longer wordFor example
$pattern = "~^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)?\h+(?:g|kcal))\b~";
$strings = [
"For a serving of 100 g Sugars: 2.3 g (Approximately)",
"For a serving of 100 g Saturated Fat: 5.8 g (Approximately)",
"For a portion of 100 g Energy Value: 290 kcal (Approximately)"
];
foreach ($strings as $string) {
preg_match($pattern, $string, $matches);
array_shift($matches);
print_r($matches);
}
Output
Array
(
[0] => 100 g
[1] => Sugars
[2] => 2.3 g
)
Array
(
[0] => 100 g
[1] => Saturated Fat
[2] => 5.8 g
)
Array
(
[0] => 100 g
[1] => Energy Value
[2] => 290 kcal
)
Upvotes: 2