Johny
Johny

Reputation: 359

Building a regex expression for PHP

I am stuck trying to create a regex that will allow for letters, numbers, and the following chars: _ - ! ? . ,

Here is what I have so far:

/^[-\'a-zA-Z0-9_!\?,.\s]+$/      //not escaping the ?

and this version too:

/^[-\'a-zA-Z0-9_!\?,.\s]+$/     //attempting to escape the ? 

Neither of these seem to be able to match the following:

"Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?"

Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.

Thanks!

UPDATE: Thanks to Lix's advice, this is what I have so far:

/^[-\'a-zA-Z0-9_!\?,\.\s]+$/

However, it's still not working??

UPDATE2 Ok, based on input this is what's happening. User inputs string, then I run the string through following functions:

$comment = preg_replace('/\s+/', '',   
htmlspecialchars(strip_tags(trim($user_comment_orig))));

So in the end, user input is just a long string of chars without any spaces. Then that string of chars is run using:

preg_match("@^[-_!?.,a-zA-Z0-9]+$@",$comment) 

What could possibly be causing trouble here?

FINAL UPDATE:

Ended up using this regex:

"@[-'A-Z0-9_?!,.]+@i"

Thanks all! lol, ya'll are going to kill me once you find out where my mistake was!

Ok, so I had this piece of code:

if(!preg_match($pattern,$comment) || strlen($comment) < 2 || strlen($comment) > 60){

GEEZ!!! I never bothered to look at the strlen part of the code. Of course it was going to fail every time...I only allowed 60 chars!!!!

Upvotes: 0

Views: 150

Answers (4)

karllindmark
karllindmark

Reputation: 6071

I got the following code to work as expected to (running php5):

<?php
    $pattern = "@[-'A-Z0-9_?!,.\s]+@i";
    $string = "Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?";

    $results = array();
    preg_match($pattern, $string, $results);

    echo '<pre>';
    print_r($results);
    echo '</pre>';
?>

The output from print_r($results) was as following:

Array
(
    [0] => Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?
)

Tested on http://writecodeonline.com/php/.

Upvotes: 1

fge
fge

Reputation: 121712

The only characters with a special meaning within a character class are:

  • the dash (since it can be used as a delimiter for ranges), except if it is used at the beginning (since in this case it is no part of any range),
  • the closing bracket,
  • the backslash.

In "pure regex parlance", your character class can be written as:

[-_!?.,a-zA-Z0-9\s]

Now, you need to escape whatever needs to be escaped according to your language and how strings are written. Given that this is PHP, you can take the above sample as is. Note that \s is interpreted in character classes as well, so this will match anything which is matched by \s outside of a character class.

While some manuals recommend using escapes for safety, knowing the general regex rules for character classes and applying them leads to shorter and easier to read results ;)

Upvotes: 0

taswyn
taswyn

Reputation: 4513

When in doubt, it's always safe to escape non alphanumeric characters in a class for matching, so the following is fine:

/^[\-\'a-zA-Z0-9\_\!\?\,\.\s]+$/

When run through a regular expression tester, this finds a match with your target just fine, so I would suggest you may have a problem elsewhere if that doesn't take care of everything.

I assume you're not including the quotes you used around the target when actually trying for a match? Since you didn't build double quote matching in...

Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.

in which case you don't need the \s if it's working correctly.

Upvotes: 1

meustrus
meustrus

Reputation: 7315

It's not necessary to escape most characters inside []. However, \s will not do what you want inside the expression. You have two options: either manually expand (/^[-\'a-zA-Z0-9_!?,. \t\n\r]+$/) or use alternation (/^(?:[-\'a-zA-Z0-9_!?,.]|\s)+$/).

Note that I left the \ before the ' because I'm assuming you're putting this in a PHP string and I wouldn't want to suggest a syntax error.

Upvotes: 0

Related Questions