gat
gat

Reputation: 2990

Check if a filename is valid

What would be the most conservative way to check if a file-name is valid in Python on all platforms (including mobile platforms like Android, iOS)?

Ex.

this_is_valid_name.jpg -> Valid

**adad.jpg -> Invalid

a/ad -> Invalid

Upvotes: 5

Views: 16884

Answers (3)

J Agustin Barrachina
J Agustin Barrachina

Reputation: 4090

I did a function myself. I used @Voo answer as a start and added checks based on this answer.

import re

def is_valid_folder_name(name: str):
    # Define a regular expression pattern to match forbidden characters
    ILLEGAL_NTFS_CHARS = r'[<>:/\\|?*\"]|[\0-\31]'
    # Define a list of forbidden names
    FORBIDDEN_NAMES = ['CON', 'PRN', 'AUX', 'NUL',
                       'COM1', 'COM2', 'COM3', 'COM4', 'COM5',
                       'COM6', 'COM7', 'COM8', 'COM9',
                       'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5',
                       'LPT6', 'LPT7', 'LPT8', 'LPT9']
    # Check for forbidden characters
    match = re.search(ILLEGAL_NTFS_CHARS, name)
    if match:
        raise ValueError(
            f"Invalid character '{match[0]}' for filename {name}")
    # Check for forbidden names
    if name.upper() in FORBIDDEN_NAMES:
        raise ValueError(f"{name} is a reserved folder name in windows")
    # Check for empty name (disallowed in Windows)
    if name.strip() == "":
        raise ValueError("Empty file name not allowed in Windows")
    # Check for names starting or ending with dot or space
    match = re.match(r'^[. ]|.*[. ]$', name)
    if match:
        raise ValueError(
            f"Invalid start or end character ('{match[0]}')"
            f" in folder name {name}"
        )

In your example:

$ is_valid_folder_name('this_is_valid_name.jpg')
$ is_valid_folder_name('**adad.jpg')
---------------------------------------------------------------------------
ValueError in is_valid_folder_name(name)
     13     match = re.search(ILLEGAL_NTFS_CHARS, name)
     14     if match:
---> 15         raise ValueError(
     16             f"Invalid character {match[0]} for filename {name}")
     17     # Check for forbidden names

ValueError: Invalid character '*' for filename **adad.jpg
$ is_valid_folder_name('a/ad')
---------------------------------------------------------------------------
ValueError in is_valid_folder_name(name)
     13     match = re.search(ILLEGAL_NTFS_CHARS, name)
     14     if match:
---> 15         raise ValueError(
     16             f"Invalid character {match[0]} for filename {name}")
     17     # Check for forbidden names

ValueError: Invalid character '/' for filename a/ad

Please, if someone finds I missed something be free to add it or comment!

Upvotes: 1

Sourav Ghosh
Sourav Ghosh

Reputation: 39

Related topic is: "Filename Pattern Matching|.

These are the methods and functions available to you:

  1. endswith() and startswith() string methods

  2. fnmatch.fnmatch()

  3. glob.glob()

  4. pathlib.Path.glob()

import os
# Get .txt files
for f_name in os.listdir('some_directory'):
    if f_name.endswith('.txt'):
       print(f_name)

Simple Filename Pattern Matching Using fnmatch( )

import os
import fnmatch
for file_name in os.listdir('some_directory/'):
    if fnmatch.fnmatch(file_name, '*.txt'):
       print(file_name)

More Advanced Pattern Matching

for filename in os.listdir('.'):
    if fnmatch.fnmatch(filename, 'data_*_backup.txt'):
       print(filename)

Filename Pattern Matching Using glob

import glob
glob.glob('*.py')

OR Code as

import glob
for name in glob.glob('*[0-9]*.txt'):
    print(name)

OR Match as

import glob
for file in glob.iglob('**/*.py', recursive=True):
    print(file)

OR Code as

from pathlib import Path
p = Path('.')
for name in p.glob('*.p*'):
    print(name)

Upvotes: -2

ddelemeny
ddelemeny

Reputation: 1931

The most harsh way to check if a file would be a valid filename on you target OSes is to check it against a list of properly tested filenames.

valid = myfilename in ['this_is_valid_name.jpg']

Expanding on that, you could define a set of characters that you know are allowed in filenames on every platform :

valid = set(valid_char_sequence).issuperset(myfilename)

But this is not going to be enough, as some OSes have reserved filenames.

You need to either exclude reserved names or create an expression (regexp) matching the OS allowed filename domain, and test your filename against that, for each target platform.

AFAIK, Python does not offer such helpers, because it's Easier to Ask Forgiveness than Permission. There's a lot of different possible combinations of OSes/filesystems, it's easier to react appropriately when the os raises an exception than to check for a safe filename domain for all of them.

Upvotes: 4

Related Questions