Reputation: 3088
I need to extract the quantity and unit from strings like this
1 tbsp
1tbsp
300ml
300 ml
10grams
10 g
The quantities will always be numbers, then there may or may not be a space then the unit. They may be 15 - 20 different units which can come from a list that we define (perhaps an array)
The solution can be in either javascript or PHP as I need to split them before storing them in a database. ie they need to be stored separately.
Thanks
EDIT: Sorry to be clear. Each new line represents a new string. That is the string would only contain 10g OR 300ml - so we just need to split one unit and one quantity at a time.
Upvotes: 1
Views: 4847
Reputation: 94153
Okay, what you can do is create an array of allowed units, and then use array_map
to apply preg_quote
on each unit in the array (so that if there are any characters in the unit that are special characters in a regular expression they will be escaped), and then construct a regular expression:
$units = array("tbsp", "ml", "g", "grams"); // add whatever other units are allowed
$pattern = '/^(\d+)\s*(' . join("|", array_map("preg_quote", $units)) . ')$/';
The $pattern
will thus become something like /^(\d+)\s*(tbsp|ml|g|grams)$/
, and then you can use it to detect things that look like units in your string:
$matches = array();
// assuming you have an array of measurement strings...
foreach ($measurement_strings as $measurement)
{
preg_match($pattern, $measurement, $matches);
list(, $quantity, $unit) = $matches;
// ...
}
Because the pattern defines two capturing groups, for the quantity and unit respectively, you can then extract those out of the match and do what you want with them.
(I've updated my answer, based on the question update that each line is a separate string).
Upvotes: 4
Reputation: 28132
Regex:
/(\d+)\s*(\D+)/
Code:
preg_match_all('/(\d+)\s*(\D+)/', $ingredients, $m);
$quantities = $m[1];
$units = array_map('trim', $m[2]);
$quantities
and $units
are:
Array
(
[0] => 1
[1] => 1
[2] => 300
[3] => 300
[4] => 10
[5] => 10
)
Array
(
[0] => tbsp
[1] => tbsp
[2] => ml
[3] => ml
[4] => grams
[5] => g
)
If you use this you don't have to have a list of units ready. But this assumes your units will have no numeric characters on them, and your quantities are numbers only.
Upvotes: 4
Reputation: 17064
Mabye something simple is enough, just like that:
^([0-9]+)\s*([a-zA-Z]+)\s*$
Upvotes: 2