Will
Will

Reputation: 5590

Regular Expression: Why am I getting matches here when I expect none?

I've a regular expression that's looking for 2-3 upper case letters, together, ending in T and beginning with P, M, C or E. The regular expression, executed in PHP, looks like this:

<?php

# The string to match against
$DT = 'Sat, 26 Nov 2011 21:04:19 GMT';

# Returns "MT" as a match
preg_match('/[PMCE][A-Z]?T/', $DT, $matches);

# I've also tried this -- returns "M" as a match
preg_match('/P|M|C|E[A-Z]?T/', $DT, $matches);

The second character is marked as optional with the ? but shouldn't it only be capable of returning PT, MT, CT, ET, or P*T, M*T, C*T, E*T?

This regular expression should not be matching the above string, I thought? I've actually already worked around with non-regular expression methods, but I'd like to know what the heck I'm doing wrong. How is it possible that "MT" is a match to either of those expressions?

In English I thought the both read "The character P,M,C,or E possibly followed by any A-Z character, followed by a T.

Upvotes: 1

Views: 93

Answers (3)

jakx
jakx

Reputation: 758

preg_match('/[PMCE][A-Z]?T/', $DT, $matches);


preg_match('/P|M|C|E[A-Z]?T/', $DT, $matches);

Both of these are matching against the GMT. If you want it to be its own word make it match a space, like this:

preg_match('/ [PMCE][A-Z]?T/', $DT, $matches);

Upvotes: 2

Artefacto
Artefacto

Reputation: 97835

The second character is marked as optional with the ? but shouldn't it only be capable of returning PT, MT, CT, ET, or P*T, M*T, C*T, E*T?

Sure, but it's returning MT, which, like you say, it's a possible match. I think your problem is that you don't expect preg_match to start a match attempt from the middle of the timezone identifier. But in that case, you have to specify so:

preg_match('/\b[PMCE][A-Z]?T/', $DT, $matches);

\b matches a word boundary.

Upvotes: 2

LukeH
LukeH

Reputation: 269558

The P|M|C|E[A-Z]?T expression translates to something like P or M or C or E[A-Z]?T, which is why it's quite happy to match the single "M".

If you want your second regex to behave more like the first then you'll need to group the or-ed characters: (P|M|C|E)[A-Z]?T should do it, but I prefer your original version anyway.

Upvotes: 2

Related Questions