BenMorel
BenMorel

Reputation: 36524

How to know if a file name is valid on the current platform?

All platforms (and probably filesystems) have different rules regarding which characters are allowed as a file or directory name. Furthermore, some systems have a blacklist of filenames: for example on Windows, com1 is an invalid file name.

Is there a way to know programmatically the rules to compute a valid file name in PHP?

As an alternative, is there a trustable list of safe characters that are guaranteed to be valid on any system, apart from [0-9a-zA-Z]?

Please note that a solution based on try to save, if it fails, the filename is invalid is not acceptable for my use case.

Upvotes: 4

Views: 1260

Answers (1)

Beshoy Girgis
Beshoy Girgis

Reputation: 467

Already answered well, Sanitizing strings to make them URL and filename safe?

I found this larger function in the Chyrp code:

/**
 * Function: sanitize
 * Returns a sanitized string, typically for URLs.
 *
 * Parameters:
 *     $string - The string to sanitize.
 *     $force_lowercase - Force the string to lowercase?
 *     $anal - If set to *true*, will remove all non-alphanumeric characters.
 */
function sanitize($string, $force_lowercase = true, $anal = false) {
    $strip = array("~", "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]",
                   "}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—",
                   "—", "–", ",", "<", ".", ">", "/", "?");
    $clean = trim(str_replace($strip, "", strip_tags($string)));
    $clean = preg_replace('/\s+/', "-", $clean);
    $clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ;
    return ($force_lowercase) ?
        (function_exists('mb_strtolower')) ?
            mb_strtolower($clean, 'UTF-8') :
            strtolower($clean) :
        $clean;
}

and this one in the wordpress code

/**
 * Sanitizes a filename replacing whitespace with dashes
 *
 * Removes special characters that are illegal in filenames on certain
 * operating systems and special characters requiring special escaping
 * to manipulate at the command line. Replaces spaces and consecutive
 * dashes with a single dash. Trim period, dash and underscore from beginning
 * and end of filename.
 *
 * @since 2.1.0
 *
 * @param string $filename The filename to be sanitized
 * @return string The sanitized filename
 */
function sanitize_file_name( $filename ) {
  $filename_raw = $filename;
  $special_chars = array("?", "[", "]", "/", "\\", "=", "<", ">", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`",
  "!", "{", "}");
  $special_chars = apply_filters('sanitize_file_name_chars', $special_chars, $filename_raw);
  $filename = str_replace($special_chars, '', $filename);
  $filename = preg_replace('/[\s-]+/', '-', $filename);
  $filename = trim($filename, '.-_');
  return apply_filters('sanitize_file_name', $filename, $filename_raw);
}

Update Sept 2012

Alix Axel has done some incredible work in this area. His phunction framework includes several great text filters and transformations.

Upvotes: 2

Related Questions