Selim
Selim

Reputation: 1132

Avoid overwriting files using regex

I have a class that replaces illegal characters that strings might contain to allow using them as filenames. The problem is that it replaces any illegal character with "_", which is fine as long as the string does not entirely consist of illegal characters. For example cleanFilename(">>>") will return the same string cleanFilename("***") returns. So storing "***" in a file after storing ">>>", would replace the first file.

public class StringCleaner {

    public static String cleanFilename(String dirtyString) {
        return dirtyString.replaceAll("[:\\/*?|<> ]", "_");
    }

    public static String cleanDirectory(String dirtyDirectory) {
        return dirtyDirectory.replaceAll("[:\\*?|<> ]", "_");
    }
}

What can i change in order to avoid this problem?
Sorry for the awkward title I could not find a better one.

Update: I want it to create readable filenames so that identification through reading the filename only will be possible.

Thanks
Selim

Upvotes: 0

Views: 83

Answers (1)

rolfl
rolfl

Reputation: 17707

So you are looking for a reversible and repeatable mechanism for replacing funny characters in file names. A typical way to do this is to create an escape sequence. For example, consider the following:

Pick a single character to use as an escape sequence. This character must be a legal character in a file name, but not commonly used, and we will use it as an escape sequence.

Let's chose the + character. Then, we replace all illegal characters with a sequence of characters that uniquely identfy the replaced character.

For example, replacing the space (character 32) in the file "this has a space" would give the result "this+32+has+32+a+32+space" ....

public class StringCleaner {

    public static void main(String[] args) {
        StringCleaner sc = new StringCleaner();
        System.out.println(sc.cleanFilename("this has a space"));
        System.out.println(sc.cleanFilename("this has a plus +"));
        System.out.println(sc.cleanFilename("this is full :\\/*?|<> + of stuff"));
    }

    private static final Pattern illegalfilechars = Pattern.compile("[:\\/*?|<> +]");
    private static final Pattern illegaldirchars = Pattern.compile("[:\\*?|<> +]");

    private static final String replaceall(Pattern pattern, String dirtyString) {
        Matcher mat = pattern.matcher(dirtyString);
        if (!mat.find()) {
            return dirtyString;
        }
        StringBuffer sb = new StringBuffer();
        do {
            mat.appendReplacement(sb, "+" + (int)mat.group(0).charAt(0) + "+");
        } while (mat.find());
        mat.appendTail(sb);
        return sb.toString();
    }

    public static String cleanFilename(String dirtyString) {
        return replaceall(illegalfilechars, dirtyString);
    }

    public static String cleanDirectory(String dirtyDirectory) {
        return replaceall(illegaldirchars, dirtyDirectory);
    }
}

When I run the code I get the results:

this+32+has+32+a+32+space
this+32+has+32+a+32+plus+32++43+
this+32+is+32+full+32++58+\+47++42++63++124++60++62++32++43++32+of+32+stuff

which also indicates that the pattern is wrong for the character '\'

Upvotes: 1

Related Questions