32423hjh32423
32423hjh32423

Reputation: 3088

Regular expression for matching quantities and unit

I need to extract the quantity and unit from strings like this

1 tbsp
1tbsp 
300ml
300 ml
10grams
10 g

The quantities will always be numbers, then there may or may not be a space then the unit. They may be 15 - 20 different units which can come from a list that we define (perhaps an array)

The solution can be in either javascript or PHP as I need to split them before storing them in a database. ie they need to be stored separately.

Thanks

EDIT: Sorry to be clear. Each new line represents a new string. That is the string would only contain 10g OR 300ml - so we just need to split one unit and one quantity at a time.

Upvotes: 1

Views: 4847

Answers (3)

Daniel Vandersluis
Daniel Vandersluis

Reputation: 94153

Okay, what you can do is create an array of allowed units, and then use array_map to apply preg_quote on each unit in the array (so that if there are any characters in the unit that are special characters in a regular expression they will be escaped), and then construct a regular expression:

$units = array("tbsp", "ml", "g", "grams"); // add whatever other units are allowed
$pattern = '/^(\d+)\s*(' . join("|", array_map("preg_quote", $units)) . ')$/';

The $pattern will thus become something like /^(\d+)\s*(tbsp|ml|g|grams)$/, and then you can use it to detect things that look like units in your string:

$matches = array();
// assuming you have an array of measurement strings...
foreach ($measurement_strings as $measurement)
{
  preg_match($pattern, $measurement, $matches);
  list(, $quantity, $unit) = $matches;
  // ...
}

Because the pattern defines two capturing groups, for the quantity and unit respectively, you can then extract those out of the match and do what you want with them.

(I've updated my answer, based on the question update that each line is a separate string).

Upvotes: 4

quantumSoup
quantumSoup

Reputation: 28132

Regex:

/(\d+)\s*(\D+)/

Code:

preg_match_all('/(\d+)\s*(\D+)/', $ingredients, $m);

$quantities = $m[1];
$units = array_map('trim', $m[2]);

$quantities and $units are:

Array
(
    [0] => 1
    [1] => 1
    [2] => 300
    [3] => 300
    [4] => 10
    [5] => 10
)
Array
(
    [0] => tbsp
    [1] => tbsp
    [2] => ml
    [3] => ml
    [4] => grams
    [5] => g
)

See: http://ideone.com/MSH8t

If you use this you don't have to have a list of units ready. But this assumes your units will have no numeric characters on them, and your quantities are numbers only.

Upvotes: 4

jwaliszko
jwaliszko

Reputation: 17064

Mabye something simple is enough, just like that:

^([0-9]+)\s*([a-zA-Z]+)\s*$

Upvotes: 2

Related Questions