Reputation: 63
i am currently working on a script to match IT eqipment models from different suppliers, the idea is to remove the -XXX numbers at the end, the ending P or a P- in the middle of the name example models are
DH-HAC-HDBW3802EP-Z HAC-HDBW3802E-Z
DH-HAC-HDBW3802EP-ZH HAC-HDBW3802E-ZH
DH-HAC-HDW1000MP-028 HAC-HDW1000M
DH-HAC-HDW1000RP-028 HAC-HDW1000R
DH-HAC-HDW1100EMP-02 HAC-HDW1100EM
DH-HAC-HDW1100EMP-03 HAC-HDW1100EM
DH-HAC-HDW1100MP HAC-HDW1100M
DH-HAC-HDW1100MP-036 HAC-HDW1100M
DH-HAC-HDW1100RP-028 HAC-HDW1100R
DH-HAC-HDW1100RP-VF HAC-HDW1100R-VF
for now i am using a rather complicated code that i must admit, does work but i have a deep inside urge to regex it a little * i know, if it works, don't mess with it* The function to clean the endings of the names is looking like
function beautifyDahua($text)
{
$text = str_replace('DHI-', '', $text);
$text = str_replace('DH-', '', $text);
if (empty($text)) {
return 'n-a';
}
//if begins with IPC sau HAC, clean further
elseif (substr( $text, 0, 4 ) === "IPC-" OR substr( $text, 0, 4 ) === "HAC-") {
$text = str_replace('AP-028', 'A', $text);
$text = str_replace('AP-036', 'A', $text);
$text = str_replace('AP', 'A', $text);
$text = str_replace('BP-028', 'B', $text);
$text = str_replace('BP-036', 'B', $text);
$text = str_replace('BP', 'B', $text);
$text = str_replace('CP-', 'C-', $text);
$text = str_replace('DP-036', 'D', $text);
$text = str_replace('DP-', 'D-', $text);
$text = str_replace('EMP-03', 'EM', $text);
$text = str_replace('EMP-02', 'EM', $text);
$text = str_replace('EMP-', 'EM-', $text);
$text = str_replace('EP-036', 'E', $text);
$text = str_replace('EP-028', 'E', $text);
$text = str_replace('EP-03', 'E', $text);
$text = str_replace('EP-02', 'E', $text);
$text = str_replace('EP-', 'E-', $text);
$text = str_replace('EP', 'E', $text);
$text = str_replace('FP-03', 'F', $text);
$text = str_replace('FP-02', 'F', $text);
$text = str_replace('FP-', 'F-', $text);
$text = str_replace('FP', 'F', $text);
$text = str_replace('RMP-03', 'RM', $text);
$text = str_replace('RMP-02', 'RM', $text);
$text = str_replace('RMP-', 'RM', $text);
$text = str_replace('RMP', 'RM', $text);
$text = str_replace('RP-028', 'R', $text);
$text = str_replace('RP-036', 'R', $text);
$text = str_replace('RP-', 'R-', $text);
$text = str_replace('RP', 'R', $text);
$text = str_replace('SP-036', 'S', $text);
$text = str_replace('SP-028', 'S', $text);
$text = str_replace('SP-', 'S-', $text);
$text = str_replace('SP', 'S', $text);
$text = str_replace('SLP-03', 'SL', $text);
$text = str_replace('TP-', 'T-', $text);
$text = str_replace('MP-036', 'M', $text);
$text = str_replace('MP-028', 'M', $text);
$text = str_replace('MP', 'M', $text);
return $text;
}
else {
return $text;
}
}
For the numbers i have a regex like \b-0(\d|\d\d)\b
But for the P situation i am in over my head.
Any advice on how to tackle this?
Upvotes: 1
Views: 67
Reputation: 63
After messing around with @apokryfos solution i came to
$text = preg_replace("/\b(DHI-|DH-)?(HAC-|IPC-)(\w+\d+)(\w(M|L)?)(P)(\w*)(-?\d+)?/", "$2$3$4", $text);
$text = preg_replace("/\b(DHI-|DH-)?/", "", $text);
But i see that Thomassos solution works out of the box, i will have to check both in the 1200+ examples i have and see wich one works best in my case, anyways, thank you alot for your support.
Upvotes: 0
Reputation: 163577
Your regex \b-0(\d|\d\d)\b
for the numbers can be written as -0\d{1,2}
. For this match I don't think you need the word boundaries \b
.
Try it like this:
(?:DHI?-)?(?:IPC|HAC)-HDB?W\d+[A-Z]+\K(?:P-0\d{1,2}|P)
The regex uses \K
to reset the starting point of the reported match and matches what comes after.
Then you could replace the selected match with an empty string.
Explanation
(?:
Non capturing group
DHI?-
Match DH with optional captital I)?
Close non capturing group(?:
Non capturing group
IPC|HAC
Match IPC or HAC)
Close non capturing group-HDB?W
Match dash HD, optional B and W\d+
Match one or more digits[A-Z]+
Match one or more uppercase characters\K
Reset starting point of the reported match(?:
Non capturing group (This will contain your match)
P-
Match P-0\d{1,2}
Match 0 and 2 digits (or \d{2,3}
to match 2 or 3 digits)|
OrP
Match P)
Close non capturing groupUpvotes: 1
Reputation: 23685
Here is the regular expression I propose you:
Pattern: (?:DHI?-)?((?:HAC|IPC)-[A-Z0-9]+)(?:P-\d+|P)
Replacement: \1
and his PHP
implementation using the preg_replace function:
$text = 'DH-HAC-HDW1000MP-028';
$result = preg_replace('/(?:DHI?-)?((?:HAC|IPC)-[A-Z0-9]+)(?:P-\d+|P)/', '$1', $text);
echo $result; // HAC-HDW1000M
You can see a working demo by visiting this link.
Upvotes: 0
Reputation: 40690
Here's something but not sure if it will work for you:
preg_replace("/\b(DH-)?(HAC-)(\w+\d+)(\w)(\w*)(-?\d+)?/", "$2$3$4", $input_lines);
So basically it matches words with an optional DH- followed by HAC- followed by any number of letters followed by any number of digits, following by letters (at least 2 optionally followed by -numbers
Here's a bit of a hacky part, because the end optionally matches -\d+
but does not use it in the replacement it will strip that out but it does not match -\w
so if trailing characters exist they will be kept. However this will fail if this is part of a sentence.
Upvotes: 0