KdgDev
KdgDev

Reputation: 14529

Regex to replace characters that Windows doesn't accept in a filename

I'm trying to build a regular expression that will detect any character that Windows does not accept as part of a file name (are these the same for other OS? I don't know, to be honest).

These symbols are:

 \ / : * ? "  | 

Anyway, this is what I have: [\\/:*?\"<>|]

The tester over at http://gskinner.com/RegExr/ shows this to be working. For the string Allo*ha, the * symbol lights up, signalling it's been found. Should I enter Allo**ha however, only the first * will light up. So I think I need to modify this regex to find all appearances of the mentioned characters, but I'm not sure.

You see, in Java, I'm lucky enough to have the function String.replaceAll(String regex, String replacement). The description says:

Replaces each substring of this string that matches the given regular expression with the given replacement.

So in other words, even if the regex only finds the first and then stops searching, this function will still find them all.

For instance: String.replaceAll("[\\/:*?\"<>|]","")

However, I don't feel like I can take that risk. So does anybody know how I can extend this?

Upvotes: 41

Views: 52632

Answers (12)

Siddharth Ramawat
Siddharth Ramawat

Reputation: 1

I was caught up in the same situation where I wanted to name files directly from a script that contained a vast majority of special characters. The approach I came up in Python was to do something like

re.sub(r"[^]\w\s`,!@#$&%_^\-)}{\['.(]", "_", text)

Java equivalent would be:

text.replaceAll("[^]\w\s`,!@#$&%_^\-)}{\['.(]", "_")

Note: I'm using Windows 11 and it supports , ! @ # $ % ^ & ` '

@Balaco mentioned that it doesn't support %, I'm not sure which version, so please do try naming files with special characters in your system to figure out the rules

Upvotes: 0

Alex_M
Alex_M

Reputation: 1874

since no answer was good enough i did it myself. hope this helps ;)

public static boolean validateFileName(String fileName) {
    return fileName.matches("^[^.\\\\/:*?\"<>|]?[^\\\\/:*?\"<>|]*") 
    && getValidFileName(fileName).length()>0;
}

public static String getValidFileName(String fileName) {
    String newFileName = fileName.replace("^\\.+", "").replaceAll("[\\\\/:*?\"<>|]", "");
    if(newFileName.length()==0)
        throw new IllegalStateException(
                "File Name " + fileName + " results in a empty fileName!");
    return newFileName;
}

Upvotes: 22

Jason
Jason

Reputation: 74

The required regex / syntax (JS):

.trim().replace(/[\\/:*?\"<>|]/g,"").substring(0,240);

where the last bit is optional, use only when you want to limit the length to 240.

other useful functions (JS):

.toUppperCase();
.toLowerCase();
.replace(/  /g,' ');     //normalising multiple spaces to one, add before substring.
.includes("str");        //check if a string segment is included in the filename
.split(".").slice(-1);   //get extension, given the entire filename contains a .

Upvotes: 3

Adam111p
Adam111p

Reputation: 3717

I use pure and simple regular expression. I give characters that may occur and through the negation of "^" I change all the other as a sign of such. "_"

String fileName = someString.replaceAll("[^a-zA-Z0-9\\.\\-]", "_");

For example: If you do not want to be in the expression a "." in then remove the "\\."

String fileName = someString.replaceAll("[^a-zA-Z0-9\\-]", "_");

Upvotes: 6

Ivan Aracki
Ivan Aracki

Reputation: 5361

I made one very simple method that works for me for most common cases:

// replace special characters that windows doesn't accept
private String replaceSpecialCharacters(String string) {
    return string.replaceAll("[\\*/\\\\!\\|:?<>]", "_")
            .replaceAll("(%22)", "_");
}

%22 is encoded if you have qoute (") in your file names.

Upvotes: 0

Balaco
Balaco

Reputation: 10

Windows also do not accept "%" as a file name.

If you are building a general expression that may affect files that will eventually be moved to other operating system, I suggest that you put more characters that may have problems with them.

For example, in Linux (many distributions I know), some users may have problems with files containing [b]& ! ] [ / - ( )[/b]. The symbols are allowed in file names, but they may need to be specially treated by users and some programs have bugs caused by their existence.

Upvotes: -2

Vysakh Prem
Vysakh Prem

Reputation: 93

I extract all word characters and whitespace characters from the original string and I also make sure that whitespace character is not present at the end of the string. Here is my code snippet in java.

temp_string = original.replaceAll("[^\\w|\\s]", "");
final_string = temp_string.replaceAll("\\s$", "");

I think I helped someone.

Upvotes: 1

bobince
bobince

Reputation: 536399

Windows filename rules are tricky. You're only scratching the surface.

For example here are some things that are not valid filenames, in addition to the chracters you listed:

                                    (yes, that's an empty string)
.
.a
a.
 a                                  (that's a leading space)
a                                   (or a trailing space)
com
prn.txt
[anything over 240 characters]
[any control characters]
[any non-ASCII chracters that don't fit in the system codepage,
 if the filesystem is FAT32]

Removing special characters in a single regex sub like String.replaceAll() isn't enough; you can easily end up with something invalid like an empty string or trailing ‘.’ or ‘ ’. Replacing something like “[^A-Za-z0-9_.]*” with ‘_’ would be a better first step. But you will still need higher-level processing on whatever platform you're using.

Upvotes: 19

Pesto
Pesto

Reputation: 23880

Java has a replaceAll function, but every programming language has a way to do something similar. Perl, for example, uses the g switch to signify a global replacement. Python's sub function allows you to specify the number of replacements to make. If, for some reason, your language didn't have an equivalent, you can always do something like this:

while (filename.matches(bad_characters)
  filename.replace(bad_characters, "")

Upvotes: 1

jpalecek
jpalecek

Reputation: 47762

You cannot do this with a single regexp, because a regexp always matches a substring if the input. Consider the word Alo*h*a, there is no substring that contains all *s, and not any other character. So if you can use the replaceAll function, just stick with it.

BTW, the set of forbidden characters is different in other OSes.

Upvotes: 0

Artelius
Artelius

Reputation: 49089

For the record, POSIX-compliant systems (including UNIX and Linux) support all characters except the null character ('\0') and forwards slash ('/') in filenames. Special characters such as space and asterisk must be escaped on the command line so that they do not take their usual roles.

Upvotes: 1

Kredns
Kredns

Reputation: 37211

You might try allowing only the stuff you want the user to be able to enter, for example A-Z, a-z, and 0-9.

Upvotes: -1

Related Questions