Reputation: 2955
we have thousands of rows of data containing articlenumers in all sort of formats and I need to split off main article number from a size indicator. There is (almost) always a dot, dash or underscore between some last characters (not always 2).
In short: Data is main article number + size indicator, the separator is differs but 1 of 3 .-_
Question: how do I split main article number + size indicator? My regex below isn't working that I built based on some Google-ing.
preg_match('/^(.*)[\.-_]([^\.-_]+)$/', $sku, $matches);
Sample data + expected result
AR.110052.15-40 [AR.110052.15 & 40]
BI.533.41-41 [BI.533.41 & 41]
CG.00554.000-39 [CG.00554.000 & 39]
LL.PX00.SC004-40 [LL.PX00.SC004 & 40]
LOS.HAPPYSOCKS.1X [LOS.HAPPYSOCKS & 1X]
MI.PMNH300043-XXXXL [MI.PMNH300043 & XXXXL]
Upvotes: 1
Views: 1743
Reputation: 47854
Use preg_split()
instead of preg_match()
because:
preg_split()
returns the exact desired array compared to preg_match()
which carries the unnecessary fullstring match in its returned array.Limit the number of elements produced (like you would with explode()
's limit parameter.
No capture groups are needed at all.
Greedily match zero or more characters, then just before matching the latest occurring delimiter, restart the fullstring match with \K
. This will effectively use the matched delimiter as the character to explode on and it will be "lost" in the explosion.
Code: (Demo)
$strings = [
'AR.110052.15-40',
'BI.533.41-41',
'CG.00554.000-39',
'LL.PX00.SC004-40',
'LOS.HAPPYSOCKS.1X',
'MI.PMNH300043-XXXXL',
];
foreach ($strings as $string) {
var_export(preg_split('~.*\K[._-]~', $string, 2));
echo "\n";
}
Output:
array (
0 => 'AR.110052.15',
1 => '40',
)
array (
0 => 'BI.533.41',
1 => '41',
)
array (
0 => 'CG.00554.000',
1 => '39',
)
array (
0 => 'LL.PX00.SC004',
1 => '40',
)
array (
0 => 'LOS.HAPPYSOCKS',
1 => '1X',
)
array (
0 => 'MI.PMNH300043',
1 => 'XXXXL',
)
Upvotes: 0
Reputation: 626689
You need to move the -
to the end of character class to make the regex engine parse it as a literal hyphen:
^(.*)[._-]([^._-]+)$
See the regex demo. Actually, even ^(.+)[._-](.+)$
will work.
^
- matches the start of string(.*)
- Group 1 capturing any 0+ chars as many as possible up to the last...[._-]
- either .
or _
or -
([^._-]+)
- Group 2: one or more chars other than .
, _
and -
$
- end of string.Upvotes: 2