Costin Nicolau
Costin Nicolau

Reputation: 63

regex replace with array

i am currently working on a script to match IT eqipment models from different suppliers, the idea is to remove the -XXX numbers at the end, the ending P or a P- in the middle of the name example models are

DH-HAC-HDBW3802EP-Z     HAC-HDBW3802E-Z     
DH-HAC-HDBW3802EP-ZH    HAC-HDBW3802E-ZH        
DH-HAC-HDW1000MP-028    HAC-HDW1000M        
DH-HAC-HDW1000RP-028    HAC-HDW1000R        
DH-HAC-HDW1100EMP-02    HAC-HDW1100EM       
DH-HAC-HDW1100EMP-03    HAC-HDW1100EM       
DH-HAC-HDW1100MP        HAC-HDW1100M        
DH-HAC-HDW1100MP-036    HAC-HDW1100M        
DH-HAC-HDW1100RP-028    HAC-HDW1100R        
DH-HAC-HDW1100RP-VF     HAC-HDW1100R-VF

for now i am using a rather complicated code that i must admit, does work but i have a deep inside urge to regex it a little * i know, if it works, don't mess with it* The function to clean the endings of the names is looking like

function beautifyDahua($text)
{
    $text = str_replace('DHI-', '', $text);
    $text = str_replace('DH-', '', $text);

    if (empty($text)) {
        return 'n-a';
    }

//if begins with IPC sau HAC, clean further

 elseif (substr( $text, 0, 4 ) === "IPC-" OR substr( $text, 0, 4 ) === "HAC-") {

    $text = str_replace('AP-028', 'A', $text);
    $text = str_replace('AP-036', 'A', $text);
    $text = str_replace('AP', 'A', $text);
    $text = str_replace('BP-028', 'B', $text);
    $text = str_replace('BP-036', 'B', $text);
    $text = str_replace('BP', 'B', $text);
    $text = str_replace('CP-', 'C-', $text);
    $text = str_replace('DP-036', 'D', $text);
    $text = str_replace('DP-', 'D-', $text);
    $text = str_replace('EMP-03', 'EM', $text);
    $text = str_replace('EMP-02', 'EM', $text);
    $text = str_replace('EMP-', 'EM-', $text);
    $text = str_replace('EP-036', 'E', $text);
    $text = str_replace('EP-028', 'E', $text);
    $text = str_replace('EP-03', 'E', $text);
    $text = str_replace('EP-02', 'E', $text);
    $text = str_replace('EP-', 'E-', $text);
    $text = str_replace('EP', 'E', $text);
    $text = str_replace('FP-03', 'F', $text);
    $text = str_replace('FP-02', 'F', $text);
    $text = str_replace('FP-', 'F-', $text);
    $text = str_replace('FP', 'F', $text);
    $text = str_replace('RMP-03', 'RM', $text);
    $text = str_replace('RMP-02', 'RM', $text);
    $text = str_replace('RMP-', 'RM', $text);
    $text = str_replace('RMP', 'RM', $text);
    $text = str_replace('RP-028', 'R', $text);
    $text = str_replace('RP-036', 'R', $text);
    $text = str_replace('RP-', 'R-', $text);
    $text = str_replace('RP', 'R', $text);
    $text = str_replace('SP-036', 'S', $text);
    $text = str_replace('SP-028', 'S', $text);
    $text = str_replace('SP-', 'S-', $text);
    $text = str_replace('SP', 'S', $text);
    $text = str_replace('SLP-03', 'SL', $text);
    $text = str_replace('TP-', 'T-', $text);
    $text = str_replace('MP-036', 'M', $text);
    $text = str_replace('MP-028', 'M', $text);
    $text = str_replace('MP', 'M', $text);
    return $text;
}
 else {

    return $text;
}
}

For the numbers i have a regex like \b-0(\d|\d\d)\b But for the P situation i am in over my head.

Any advice on how to tackle this?

Upvotes: 1

Views: 67

Answers (4)

Costin Nicolau
Costin Nicolau

Reputation: 63

After messing around with @apokryfos solution i came to

$text = preg_replace("/\b(DHI-|DH-)?(HAC-|IPC-)(\w+\d+)(\w(M|L)?)(P)(\w*)(-?\d+)?/", "$2$3$4", $text);
$text = preg_replace("/\b(DHI-|DH-)?/", "", $text);

But i see that Thomassos solution works out of the box, i will have to check both in the 1200+ examples i have and see wich one works best in my case, anyways, thank you alot for your support.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163577

Your regex \b-0(\d|\d\d)\b for the numbers can be written as -0\d{1,2}. For this match I don't think you need the word boundaries \b.

Try it like this:

(?:DHI?-)?(?:IPC|HAC)-HDB?W\d+[A-Z]+\K(?:P-0\d{1,2}|P)

The regex uses \K to reset the starting point of the reported match and matches what comes after. Then you could replace the selected match with an empty string.

Explanation

  • (?: Non capturing group
    • DHI?- Match DH with optional captital I
  • )? Close non capturing group
  • (?: Non capturing group
    • IPC|HAC Match IPC or HAC
  • ) Close non capturing group
  • -HDB?W Match dash HD, optional B and W
  • \d+ Match one or more digits
  • [A-Z]+ Match one or more uppercase characters
  • \K Reset starting point of the reported match
  • (?: Non capturing group (This will contain your match)
    • P- Match P-
    • 0\d{1,2} Match 0 and 2 digits (or \d{2,3} to match 2 or 3 digits)
    • | Or
    • P Match P
  • )Close non capturing group

Demo php

Upvotes: 1

Tommaso Belluzzo
Tommaso Belluzzo

Reputation: 23685

Here is the regular expression I propose you:

Pattern:     (?:DHI?-)?((?:HAC|IPC)-[A-Z0-9]+)(?:P-\d+|P)
Replacement: \1

and his PHP implementation using the preg_replace function:

$text = 'DH-HAC-HDW1000MP-028';            
$result = preg_replace('/(?:DHI?-)?((?:HAC|IPC)-[A-Z0-9]+)(?:P-\d+|P)/', '$1', $text);
echo $result; // HAC-HDW1000M

You can see a working demo by visiting this link.

Upvotes: 0

apokryfos
apokryfos

Reputation: 40690

Here's something but not sure if it will work for you:

preg_replace("/\b(DH-)?(HAC-)(\w+\d+)(\w)(\w*)(-?\d+)?/", "$2$3$4", $input_lines);

So basically it matches words with an optional DH- followed by HAC- followed by any number of letters followed by any number of digits, following by letters (at least 2 optionally followed by -numbers

Here's a bit of a hacky part, because the end optionally matches -\d+ but does not use it in the replacement it will strip that out but it does not match -\w so if trailing characters exist they will be kept. However this will fail if this is part of a sentence.

Upvotes: 0

Related Questions