StackUser
StackUser

Reputation: 94

How to get specific part of string in php using regex?

I have product names in an array,

$productArr = array("Dell Inspiron 15 3521 Laptop CDC/ 2GB/ 500GB/ Linux Black Matte Textured Finish","Dell Inspiron 15 3521 Laptop CDC/ 4GB/ 500GB/ Win8 Black","Nikon D90 DSLR (Black) with AF-S 18-105mm VR Kit Lens");

I want to get the output like,

$productModArr = array("Dell Inspiron 15 3521","Dell Inspiron 15 3521","Nikon D90")

And then i need to remove the duplicate string,

$productModArr = array("Dell Inspiron 15 3521","Nikon D90")

I tried substr and strpos but those are not working in my case.

Upvotes: 0

Views: 123

Answers (3)

mickmackusa
mickmackusa

Reputation: 47894

In my battery of test strings, I have included all sample strings that you have mentioned in the question and/or in comments.

My pattern will assume that you want to retain the substring from the start of the each string to the end of the first space-separated sequence of strings containing a digit. This is as difficult to explain in plain English as it is to write as a regular expression. :)

Pattern Explanation: Pattern & Replacement Demo

/                           #opening pattern delimiter
^                           #match from the start of the string
\D*                         #match zero or more characters that are not a digit
(?: [a-z\d-]*\d[a-z\d-]*)+  #match zero or more letters,digits,hyphens followed by a digit followed by zero or more letters,digits,hyphens
\K                          #restart the fullstring match (to avoid using a capture group)
.*                          #match remainder of the string
/                           #closing pattern delimiter
i                           #case-insensitive pattern modifier

Code: (PHP Demo)

$productArr=[
    'Dell Inspiron 15 3521 Laptop CDC/ 2GB/ 500GB/ Linux Black Matte Textured Finish',
    'Dell Inspiron 15 3521 Laptop CDC/ 4GB/ 500GB/ Win8 Black',
    'Nikon D90 DSLR (Black) with AF-S 18-105mm VR Kit Lens',
    'Toshiba Satellite L50-B I3010',
    'Lenovo Z50-70',
    'Apple',
    'HP 15-g221AU (Notebook)',
    'Acer Aspire E5-571-56UR Notebook'
];
var_export(array_flip(array_flip(preg_replace('/^\D*(?: [a-z\d-]*\d[a-z\d-]*)+\K.*/i','',$productArr))));

Output:

array (
  1 => 'Dell Inspiron 15 3521',
  2 => 'Nikon D90',
  3 => 'Toshiba Satellite L50-B I3010',
  4 => 'Lenovo Z50-70',
  5 => 'Apple',
  6 => 'HP 15-g221AU',
  7 => 'Acer Aspire E5-571-56UR',
)

p.s. I am using a double call of array_flip() because:

  1. It will not damage your values.
  2. It operates faster than array_unique().
  3. It allows the entire method to be chained together in one line.

(You are welcome to use array_unique() if you wish -- same, same.)

Upvotes: 0

Rounin
Rounin

Reputation: 29463

You can use the custom PHP function below (employing Regex) to produce your desired output as outlined in your question above.

Depending on the patterns you want to match in your product feed, you may need to tweak the Regex matching pattern a little.

function getProductNames ($productArr) {

$countproductArr = count($productArr);
for ($i = 0; $i < $countproductArr; $i++) {
$productModArr[] = preg_replace('/^([^0-9]*)([0-9]+)(\s[0-9]+)?(.*)/', '$1$2$3', $productArr[$i]);
}

$productModArr = array_unique($productModArr);
$productModArr = array_values($productModArr);
return $productModArr;
}

$productModArr = getProductNames($productArr);

Explanation of the Regex:

/ [... match goes here... ] /

^ - start of match

([^0-9]*) - [first capture group] any non-numerical character, any number of times

([0-9]+) - [second capture group] any numerical character, from 1 to any number of times

(\s[0-9]+)? - [third (optional) capture group] a space followed by any numerical character, from 1 to any number of times

(.*) - any number of characters any number of times following the third (optional) capture group

=======

Update

Replacing the relevant line in the function above with:

$productModArr[] = preg_replace('/^([^0-9]*)([^\s]+)(\s[A-Z]?[0-9]+)?(.*)/', '$1$2$3', $productArr[$i]);

may be a better match for the product name and odel number in more instances.

Upvotes: 1

Lisa
Lisa

Reputation: 54

This will work for your 3 examples, but since there is no standard in such product names there are probably a lot of situations where the format is different.

$productArr = array("Dell Inspiron 15 3521 Laptop CDC/ 2GB/ 500GB/ Linux Black Matte Textured Finish","Dell Inspiron 15 3521 Laptop CDC/ 4GB/ 500GB/ Win8 Black","Nikon D90 DSLR (Black) with AF-S 18-105mm VR Kit Lens");
 $newArr = array();
foreach($productArr as $str){
  $pattern = "/[a-zA-Z\ ]*[ A-Z]?[0-9 ]*/";
  preg_match($pattern, $str,$match);
  $newArr[] = $match[0];
}

$result = array_unique($newArr);
print_r($result);

Upvotes: 0

Related Questions