Kai23
Kai23

Reputation: 1576

Regex for PHP seems simple but is killing me

I'm trying to make a replace in a string with a regex, and I really hope the community can help me.

I have this string :

031,02a,009,a,aaa,AZ,AZE,02B,975,135

And my goal is to remove the opposite of this regex

[09][0-9]{2}|[09][0-9][A-Za-z]

i.e.

a,aaa,AZ,AZE,135

(to see it in action : http://regexr.com?3795f )

My final goal is to preg_replace the first string to only get

031,02a,009,02B,975

(to see it in action : http://regexr.com?3795f )

I'm open to all solution, but I admit that I really like to make this work with a preg_replace if it's possible (It became something like a personnal challenge)

Thanks for all help !

Upvotes: 3

Views: 124

Answers (5)

Andrew Cheong
Andrew Cheong

Reputation: 30283

As @Taemyr pointed out in comments, my previous solution (using a lookbehind assertion) was incorrect, as it would consume 3 characters at a time even while substrings weren't always 3 characters.

Let's use a lookahead assertion instead to get around this:

'/(^|,)(?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]*/'

The above matches the beginning of the string or a comma, then checks that what follows does not match one of the two forms you've specified to keep, and given that this condition passes, matches as many non-comma characters as possible.

However, this is identical to @anubhava's solution, meaning it has the same weakness, in that it can leave a leading comma in some cases. See this Ideone demo.

ltriming the comma is the clean way to go there, but then again, if you were looking for the "clean way to go," you wouldn't be trying to use a single preg_replace to begin with, right? Your question is whether it's possible to do this without using any other PHP functions.

The anwer is yes. We can take

'/(^|,)foo/'

and distribute the alternation,

'/^foo|,foo/'

so that we can tack on the extra comma we wish to capture only in the first case, i.e.

'/^foo,|,foo/'

That's going to be one hairy expression when we substitute foo with our actual regex, isn't it. Thankfully, PHP supports recursive patterns, so that we can rewrite the above as

'/^(foo),|,(?1)/'

And there you have it. Substituting foo for what it is, we get

'/^((?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]*),|,(?1)/'

which indeed works, as shown in this second Ideone demo.


Let's take some time here to simplify your expression, though. [0-9] is equivalent to \d, and you can use case-insensitive matching by adding /i, like so:

'/^((?![09]\d{2}|[09]\d[a-z])[^,]*),|,(?1)/i'

You might even compact the inner alternation:

'/^((?![09]\d(\d|[a-z]))[^,]*),|,(?1)/i'

Upvotes: 4

Matteo Tassinari
Matteo Tassinari

Reputation: 18584

Try this:

preg_replace('/(^|,)[1-8a-z][^,]*/i', '', $string);

this will remove all substrings starting with the start of the string or a comma, followed by a non allowed first character, up to but excluding the following comma.

As per @GeoffreyBachelet suggestion, to remove residual commas, you should do:

trim(preg_replace('/(^|,)[1-8a-z][^,]*/i', '', $string), ',');

Upvotes: 0

anubhava
anubhava

Reputation: 785561

This should work for you:

$s = '031,02a,009,a,aaa,AZ,AZE,02B,975,135';
echo ltrim(preg_replace('/(^|,)(?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]+/', '', $s), ',');

OUTPUT:

031,02a,009,02B,975

Upvotes: 1

Geoffrey Bachelet
Geoffrey Bachelet

Reputation: 4317

Try matching what you want to keep and then joining it with commas:

preg_match_all('/[09][0-9]{2}|[09][0-9][A-Za-z]/', $input, $matches);
$result = implode(',', $matches);

The problem you'll be facing with preg_replace is the extra-commas you'll have to strip, cause you don't just want to remove aaa, you actually want to remove aaa, or ,aaa. Now what when you have things to remove both at the beginning and at the end of the string? You can't just say "I'll just strip the comma before", because that might lead to an extra comma at the beginning of the string, and vice-versa. So basically, unless you want to mess with lookaheads and/or lookbehinds, you'd better do this in two steps.

Upvotes: 1

SteeveDroz
SteeveDroz

Reputation: 6136

Try it in more steps:

$newList = array();
foreach (explode(',', $list) as $element) {
  if (!preg_match('/[09][0-9]{2}|[09][0-9][A-Za-z]/', $element) {
    $newList[] = $element;
  }
}
$list = implode(',', $newList);

You still have your regex, see! Personnal challenge completed.

Upvotes: 1

Related Questions