Reputation: 392
OK..
I'm working blind on a "big" product web app...
We've got a couple of thousand products each with a bunch of data elements coming in from multiple vendors in various formats... so, needless to say, we can't see the data...
Here's the short version of today's problem...
We want to extract the "size" from the 'product name'
$product_name = "Socket Assembly w/ 25 ft Lamp Cord - 14 Gauge ";
and here's "part of the Sizes array....
$lookForTheseSizes = array( ...'Gallon','gal','Gal','G','Gram','gram','g','gm','Gauge','gauge'... );
The Sizes array, currently with around 100 values, is built dynamically and may change with new values added without notice.
So this script does not always work... as it is dependent on how the Sizes array values are ordered.
foreach ($lookForTheseSizes as $key => $value){
if (strpos( $nameChunk,$value) !== false) {
echo 'match '.$nameChunk.' => '.$value.'<br/>';
$size = $value;
break;
}
}
For example... when $nameChunk = "Gauge" ... the script returns a "match" on 'g' first....
So... my question is this... Is there a way -regex or standard php 5.4 or better function- to do an extract find/match ... WITHOUT first sorting the Sizes array ?
Upvotes: 0
Views: 330
Reputation: 16214
$product_name = "Socket Assembly w/ 25 ft Lamp Cord - 14 Gauge ";
$lookForTheseSizes = array('Gallon', 'gal', 'Gal', 'G', 'Gram', 'gram', 'g',
'gm', 'Gauge', 'gauge', 'ft');
foreach($lookForTheseSizes as $unit)
{
if (preg_match('/(?P<size>[\d.]+)\s*' . preg_quote($unit) . '\b/U',
$product_name, $matches))
echo $matches['size'] . " " . $unit . "\n";
}
Result
14 Gauge
25 ft
Or..
$units = join('|' , array_map('preg_quote', $lookForTheseSizes));
if (preg_match_all('/(?P<size>[\d.]+)\s*(?P<unit>' . $units . ')\b/U',
$product_name, $matches))
var_dump($matches);
Look at $matches
and do what you want.
[0]=>
array(2) {
[0]=>
string(5) "25 ft"
[1]=>
string(8) "14 Gauge"
}
["size"]=>
array(2) {
[0]=>
string(2) "25"
[1]=>
string(2) "14"
}
["unit"]=>
array(2) {
[0]=>
string(2) "ft"
[1]=>
string(5) "Gauge"
}
I would throw out the case-sensitive repeating units from the array and use additional modifier i
in regex (it will be /iU
instead of /U
).
Upvotes: 1