Reputation: 47
The target is to check a product description and to identify different characteristics/product options. The input data has the following structure:
// TABLE WITH INPUT DATA. STRUCTURE: PRODUCT_CATEGORY [0], PRODUCT_NUMBER[1], DESCRIPTION OF AN OPTION [2]. THE INPUT DATA TABLE CAN CONSIST OF UP TO 400-500 ROWS
$input_product_data = array (
array('AAAA','1111','Chimney with red bricks in the center of the room'),
array('BBBB','2222','Two wide windows in the main floor'),
array('BBBB','2233','Plastic window has to be changed later'),
array('CCCC','3333','Roof tiles renewed in 2015'),
array('NULL','4444','Floor has been renovated for two years. Currently it has ground in wood.'),
array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
);
There are 3 different constelations to indicate a product option:
Example:
Input data: array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
Search string: 'green color' within PRODUCT_DESCRIPTION
Result: Available
Example:
Input data: array('CCCC','NULL','Roof tiles renewed in 2015'),
Search strings: 'CCCC' within PRODUCT_CATEGORY + 'green color' within PRODUCT_DESCRIPTION
Result: Available
Example:
Input data: array('AAAA','1111','Chimney with red bricks in the center of the room')
Search strings: 'AAAA' within PRODUCT_CATEGORY + '1111' within PRODUCT_NUMBER + 'Chimney' within PRODUCT_DESCRIPTION
Result: Available
IMPORTANT:
REALIZATION VARIANT A (by use of preg_match):
// TABLE FOR PRODUCT OPTIONS. STRUCTURE: ID[0], OPTION NAME[1], OPTION CATEGORY[2], OPTION-FAMILY[3], PROD.-NR[4], REG. EXPRESSION[5], PRIORITY[6], OUTPUT[7]
$ct_product_options = array (
array('0001', 'Chimney', 'Additional options', '/^AAAA/', '/9999/', '/^Chimney with./', '0', 'Available'),
array('0002', 'Material of ground floor', 'Additional options', '/NULL/', '/^4444$/', '/.wood./', '0', 'Wood'),
array('0003', 'Roof tiles', 'Basic options', '/^CCCC/', '/0022/', '/^Roof tiles./', '0', 'Available'),
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.window$/', '0', 'Available'),
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.wide windows./', '0', 'Available'),
array('0005', 'Door color', 'Basic options', '/NULL/', '/NULL/', '/green/', '0', 'Green'),
array('0006', 'Air condition', 'Additional options', '/NULL/', '/NULL/', '/^Air condition made in Japan/', '0', 'Green')
);
// FOR LOOP TO MAKE COMPARISON BETWEEN INPUT PRODUCT DATA AND PREDEFINED CUST. STRINGS
$matches_array = array();
foreach ($input_product_data as [$product_family, $product_number, $product_description]) {
foreach($ct_product_options as [$option_id, $option_name, $option_category, $product_family_reg_exp, $product_number_reg_exp, $regular_expression, $priority, $output]) {
if (preg_match($regular_expression, $product_description) == 1
&& preg_match($product_family_reg_exp, $product_family) == 1 ||
preg_match($regular_expression, $product_description) == 1
&& preg_match($product_number_reg_exp, $product_number) == 1) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
else {
if (empty($product_family) && empty($product_number)) {
if (preg_match($regular_expression, $product_description) == 1) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
}
}
}
}
//echo "<pre>";
//print_r($matches_array);
// FUNCTION FOR DELETE DUBLICATES FROM ARRAY WITH MATCHES
function unique_multidimensional_array($array, $key) {
$temp_array = array();
$i = 0;
$key_array = array();
foreach($array as $val) {
if (!in_array($val[$key], $key_array)) {
$key_array[$i] = $val[$key];
$temp_array[$i] = $val;
}
$i++;
}
return $temp_array;
}
//echo "<br><h3>UNIQUE MATCHES</h3>";
// CALL OF THE FUNCTION TO GET UNIQUE MATCHES
$unique_matches = unique_multidimensional_array($matches_array, 'id');
sort($unique_matches);
//echo "<pre>";
//print_r($unique_matches);
// CALL OF THE FUNCTION TO CREATE LIST/ARRAY WITH ALL AVAILABLE PRODUCT OPTIONS
$list_all_product_options = unique_multidimensional_array($ct_product_options, 0);
$list_all_product_options_short = array();
foreach ($list_all_product_options as $option_item) {
$list_all_product_options_short[] = array("id" => $option_item[0], "option_name" => $option_item[1], "option_category" => $option_item[2]);
}
sort($list_all_product_options_short);
//echo "<h3>LIST WITH ALL PRODUCT OPTIONS (SHORT VERSION)</h3>\n";
//echo "<pre>";
//print_r($list_all_product_options_short);
$unique_matches = array_column($unique_matches, null, 'id');
foreach ($list_all_product_options_short as $key => $value) {
if (isset($unique_matches[$value['id']])) {
$result[$key] = array_merge($value, $unique_matches[$value['id']]);
} else {
$result[$key] = array_merge($value, ['output' => 'Not available']);
}
}
echo "<h3>FINAL RESULTS</h3>\n";
//echo "<pre><br>\n";
print_r($result);
The variant realized with preg_match works well and provide quite good flexibilty by defining of the regex. E. g. Instead to define the whole product number "2222" I can use only "/^2.../". Or I can combine many regex within one row by use of "|" (e. g. ".wide windows. | some window | etc."). The problem is that by real data volume 500 rows within $input_product_data and 3000 rows within $ct_product_options the code is quite slow.
REALIZATION VARIANT B (by use of stripos):
// INPUT DATA WITH PRODUCT DESCRIPTION. STRUCTURE: PROD. FAMILY, PROD. NUMBER, PRODUCT DESCRIPTION
$input_product_data = array (
array('AAAA','1111','Chimney with red bricks in the center of the room'),
array('BBBB','2222','Two wide windows in the main floor'),
array('BBBB','2233','Plastic window has to be changed later'),
array('CCCC','3333','Roof tiles renewed in 2015'),
array('NULL','4444','Floor has been renovated for two years. Currently it has ground in wood.'),
array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
);
// CUSTOMIZING TABLE FOR PRODUCT OPTIONS. STRUCTURE: ID[0], OPTION NAME[1], OPTION CATEGORY[2], OPTION-FAMILY[3], PROD.-NR[4], REG. EXPRESSION[5], PRIORITY[6], OUTPUT[7]
$ct_product_options = array (
array('0001', 'Chimney', 'Additional options', 'AAAA', '9999', 'Chimney with', '0', 'Available'),
array('0002', 'Material of ground floor', 'Additional options', 'NULL', '4444', 'wood', '0', 'Wood'),
array('0003', 'Roof tiles', 'Basic options', 'CCCC', '0022', 'Roof tiles', '0', 'Available'),
array('0004', 'Windows', 'Basic options', 'BBBB', '2222', 'window', '0', 'Available'),
array('0004', 'Windows', 'Basic options', 'BBBB', '2222', 'wide windows', '0', 'Available'),
array('0005', 'Door color', 'Basic options', 'NULL', 'NULL', 'green', '0', 'Green'),
array('0006', 'Air condition', 'Additional options', 'NULL', 'NULL', 'Air condition made in Japan', '0', 'Green')
);
// IMPORTANT: THE REG. EXPRESSIONS CAN BE DEFINED MANY TIME (e. g. 10 DIFFERENT REG: EXPRESSIONS FOR WINDOW). POINTS "." REPRESENTS EMPTY SPACES WHICH ARE IMPORTANT TO INDETIFY EXACTLY AN OPTION.
// FOR LOOP TO MAKE COMPARISON BETWEEN INPUT PRODUCT DATA AND PREDEFINED CUST. STRINGS
$matches_array = array();
foreach ($input_product_data as [$product_family, $product_number, $product_description]) {
foreach($ct_product_options as [$option_id, $option_name, $option_category, $product_family_reg_exp, $product_number_reg_exp, $regular_expression, $priority, $output]) {
if (stripos($product_description, $regular_expression) !== false
&& stripos($product_family, $product_family_reg_exp) !== false ||
stripos($product_description, $regular_expression) !== false
&& stripos($product_number, $product_number_reg_exp) !== false) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
else {
if (empty($product_family) && empty($product_number)) {
if (stripos($product_description, $regular_expression) !== false) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
}
}
}
}
//echo "<pre>";
//print_r($matches_array);
// FUNCTION FOR DELETE DUBLICATES FROM ARRAY WITH MATCHES
function unique_multidimensional_array($array, $key) {
$temp_array = array();
$i = 0;
$key_array = array();
foreach($array as $val) {
if (!in_array($val[$key], $key_array)) {
$key_array[$i] = $val[$key];
$temp_array[$i] = $val;
}
$i++;
}
return $temp_array;
}
//echo "<br><h3>UNIQUE MATCHES</h3>";
// CALL OF THE FUNCTION TO GET UNIQUE MATCHES
$unique_matches = unique_multidimensional_array($matches_array, 'id');
sort($unique_matches);
//echo "<pre>";
//print_r($unique_matches);
// CALL OF THE FUNCTION TO CREATE LIST/ARRAY WITH ALL AVAILABLE PRODUCT OPTIONS
$list_all_product_options = unique_multidimensional_array($ct_product_options, 0);
$list_all_product_options_short = array();
foreach ($list_all_product_options as $option_item) {
$list_all_product_options_short[] = array("id" => $option_item[0], "option_name" => $option_item[1], "option_category" => $option_item[2]);
}
sort($list_all_product_options_short);
//echo "<h3>LIST WITH ALL PRODUCT OPTIONS (SHORT VERSION)</h3>\n";
//echo "<pre>";
//print_r($list_all_product_options_short);
// ::::::::::::::::::::::::::::::::::
$unique_matches = array_column($unique_matches, null, 'id');
foreach ($list_all_product_options_short as $key => $value) {
if (isset($unique_matches[$value['id']])) {
$result[$key] = array_merge($value, $unique_matches[$value['id']]);
} else {
$result[$key] = array_merge($value, ['output' => 'Not available']);
}
}
echo "<h3>FINAL RESULTS</h3>\n";
//echo "<pre><br>\n";
print_r($result);
It works much faster, but does not provide the felixibility of regex.
So, my questions:
Do you see any ways to optimize VARIANT A to get it faster or optimize VARIANT B to get it more flexible?
Especial question: How I can add the logic for the parameter PRIORITY from the table $ct_product_options?
The business logic is the following for it: As default all rows/rules have priority "0". But some of them will get priority ">0" (e. g. "1" or "2" etc.). The rule with highest priority should overwrite other rules.
E. g.
This rule with priority "0" identified windows in the house.
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.wide windows./', '0', 'Available')
At the same time this rule with priority "1" tells us that all windows are not available more. So, that means we have to get "Not available" within the final results.
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/^Windows have been removed from the whole building last year/', '1', 'Not available')
Upvotes: 2
Views: 135
Reputation: 2115
Before optimizing the variants, I believe I should tell how I would implement a solution to solve generate the intended array.
I ran your code to understand better what should be the result. But instead of using print_r
, I did this:
echo json_encode($result, JSON_PRETTY_PRINT);
I got this:
[
{
"id": "0001",
"option_name": "Chimney",
"option_category": "Additional options",
"output": "Available"
},
{
"id": "0002",
"option_name": "Material of ground floor",
"option_category": "Additional options",
"output": "Wood"
},
{
"id": "0003",
"option_name": "Roof tiles",
"option_category": "Basic options",
"output": "Available"
},
{
"id": "0004",
"option_name": "Windows",
"option_category": "Basic options",
"output": "Available"
},
{
"id": "0005",
"option_name": "Door color",
"option_category": "Basic options",
"output": "Green"
},
{
"id": "0006",
"option_name": "Air condition",
"option_category": "Additional options",
"output": "Not available"
}
]
I noticed each array element is an element from $ct_product_options
mapped to some format. So, I used array_map
like this:
$result = array_map(
fn($option) => [
'id' => $option[0],
'option_name' => $option[1],
'option_category' => $option[2],
'output' => get_option_output($option, $input_product_data),
],
$ct_product_options
);
Now I have to implement get_option_output
. I think all those nested foreach
and if
in both A and B variants make the code hard to understand (besides how each line is indented). If I understand correctly your intentions, it seems this has a bug:
if (
preg_match($regular_expression, $product_description) == 1
&& preg_match($product_family_reg_exp, $product_family) == 1 ||
preg_match($regular_expression, $product_description) == 1
&& preg_match($product_number_reg_exp, $product_number) == 1) {
And you wanted to do something like this:
$productDescriptionMatches = preg_match($regular_expression, $product_description);
if (
(
$productDescriptionMatches
&& preg_match($product_family_reg_exp, $product_family)
) || (
$productDescriptionMatches
&& preg_match($product_number_reg_exp, $product_number)
)
) {
Which is equivalent to:
if (
preg_match($regular_expression, $product_description)
&& (
preg_match($product_family_reg_exp, $product_family)
|| preg_match($product_number_reg_exp, $product_number)
)
) {
If I counted everything correctly, and assuming you made that mistake, I believe you want something like this:
function some($array, $callback)
{
foreach ($array as $item) {
if ($callback($item)) {
return $item;
}
}
return false;
}
function get_option_output($option, $products)
{
$found = some(
$products,
fn($product) =>
(
preg_match($option[5], $product[2])
&& (
preg_match($option[3], $product[0])
|| preg_match($option[4], $product[1])
|| (
empty($product[0])
&& empty($product[1])
)
)
)
);
return $found ? $option[7] : 'Not available';
}
$result = array_map(
fn($option) => [
'id' => $option[0],
'option_name' => $option[1],
'option_category' => $option[2],
'output' => get_option_output($option, $input_product_data),
],
$ct_product_options
);
In average, the execution time of that code was: 0.0000189903259277 seconds. I ran 10,000 iterations.
Variant A took in average: 0.0000316595554352 seconds. Variant B took in average: 0.0000314178943634 seconds.
The code I provided doesn't have nested loops and doesn't have to remove repeated elements and sorting them twice. But it's possible to make it run faster:
$result = [];
foreach ($ct_product_options as $option) {
foreach ($input_product_data as $product) {
$output = null;
$isAvailable =
(
preg_match($option[5], $product[2])
&& (
preg_match($option[3], $product[0])
|| preg_match($option[4], $product[1])
|| (
empty($product[0])
&& empty($product[1])
)
)
);
if ($isAvailable) {
$output = $option[7];
break;
}
}
$result []= [
'id' => $option[0],
'option_name' => $option[1],
'option_category' => $option[2],
'output' => $output ?? 'Not available',
];
}
It took, in average, 0.0000132960796356 seconds. But it's harder to understand.
That answers the first question. Use an array_map
.
It also helps to answer the special question: change the function get_option_output
accordingly.
If priority is the regular expression that should be used (and all the others should be ignored), then do something like this (also check if the priority is valid):
function get_option_output($option, $products)
{
$priority = (int)$option[6];
$found = find(
$products,
fn($product) => preg_match(
$option[3 + $priority],
$product[$priority]
)
);
return $found ? $option[7] : 'Not available';
}
If the one with the highest priority should be checked first, and the others should also be checked:
function some($array, $callback)
{
foreach ($array as $index => $item) {
if ($callback($item, $index)) {
return true;
}
}
return false;
}
function get_option_output($option, $products)
{
$priority = (int)$option[6];
$found = some(
$products,
fn($product) =>
preg_match($option[3 + $priority], $product[$priority])
|| some(
$product,
fn($text, $index) =>
$index !== $priority
&& preg_match($option[3 + $index], $product[$index])
)
);
return $found ? $option[7] : 'Not available';
}
If I didn't understand the details and something is missing, nevertheless probably what was provided might help.
unique_multidimensional_array
reimplementationfunction unique_multidimensional_array($array, $key) {
$valuesByKey = [];
foreach($array as $value) {
$elementsByKey[$value[$key]] = $value;
}
return array_values($valuesByKey);
}
Upvotes: 1