James Dawson
James Dawson

Reputation: 5409

Only allowing certain characters in a string

I want to disallow all symbols in a string, and instead of going and disallowing each one I thought it'd be easier to just allow alphanumeric characters (a-z A-Z 0-9).

How would I go about parsing a string and converting it to one which only has allowed characters? I also want to convert any spaces into _.

At the moment I have:

function parseFilename($name) {
    $allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
    $name = str_replace(' ', '_', $name);

    return $name;
}

Thanks

Upvotes: 0

Views: 2551

Answers (5)

mickmackusa
mickmackusa

Reputation: 48073

The replacement if spaces with spaces does not require the might of the regex engine; it can wait out the first round of replacements.

The purging of all non-alphanumeric characters and underscores is concisely handled by \W -- it means any character not in a-z, A-Z, 0-9, or _.

Code: (Demo)

function sanitizeFilename(string $name): string {
    return preg_replace(
               '/\W+/',
               '',
               str_replace(' ', '_', $name)
           );
}

echo sanitizeFilename('This/is My     1! FilenAm3');

Output:

Thisis_My_____1_FilenAm3

...but if you want to condense consecutive spaces and replace them with a single underscore, then use regex. (Demo)

function sanitizeFilename(string $name): string {
    return preg_replace(
               ['/ +/', '/\W+/'],
               ['_', ''],
               $name
           );
}

echo sanitizeFilename('This/has a      Gap !n 1t');

Output:

Thishas_a_Gap_n_1t

Upvotes: 0

Leontin Groza
Leontin Groza

Reputation: 7

Try working with the HTML part

pattern="[A-Za-z]{8}" title="Eight letter country code">

Upvotes: -2

traq
traq

Reputation: 281

You could do both replacements at once by using arrays as the find / replace params in preg_match():

$str = 'abc def+ghi&jkl   ...z';
$find = array( '#[\s]+#','#[^\w]+#' );
$replace = array( '_','' );
$newstr = preg_replace( $find,$replace,$str );
print $newstr;

// outputs:
// abc_defghijkl_z

\s matches whitespace (replaced with a single underscore), and as @F.J described, ^\w is anything "not a word character" (replaced with empty string).

Upvotes: 1

Andrew Clark
Andrew Clark

Reputation: 208665

preg_replace() is the way to go here, the following should do what you want:

function parseFilename($name) {
    $name = str_replace(' ', '_', $name);
    $name = preg_replace('/[^\w]+/', '', $name);
    return $name;
}

[^\w] is equivalent to [^a-zA-Z0-9_], which will match any character that is not alphanumeric or an underscore. The + after it means match one or more, this should be slightly more efficient than replacing each character individually.

Upvotes: 0

Ansari
Ansari

Reputation: 8218

Try

$name = preg_replace("/[^a-zA-Z0-9]/", "", $name);

Upvotes: 2

Related Questions