Ken
Ken

Reputation: 2731

How to make a valid Windows filename from an arbitrary string?

I've got a string like "Foo: Bar" that I want to use as a filename, but on Windows the ":" char isn't allowed in a filename.

Is there a method that will turn "Foo: Bar" into something like "Foo- Bar"?

Upvotes: 116

Views: 54303

Answers (17)

Filippo Mondinelli
Filippo Mondinelli

Reputation: 11

An efficient way to do this is

    string.Join("_", fileName.Split(System.IO.Path.GetInvalidFileNameChars(), StringSplitOptions.RemoveEmptyEntries))

Upvotes: 1

EricBDev
EricBDev

Reputation: 1569

Still another solution I am using for the last ~10 years, very similar to previous solutions, without the 'fancy' parts: The main method gets the specialcharacters as input, since I was using it also for other purposes, e.g. getting web compatible names, especially back then when renaming files for SharePoint/OneDrive

Not sure how much it improves the speed, but also chose to check the filename for any special characters BEFORE using the StringBuilder with IndexOfAny().

private static string SanitizeFilename(this string filename) 
   => filename.RemoveOrReplaceSpecialCharacters(Path.GetInvalidFileNameChars(), '_');

private static string RemoveOrReplaceSpecialCharacters(this string str, char[] specialCharacters, char? replaceChar)
{
    if (string.IsNullOrEmpty(str))
        return str;
    if (specialCharacters == null || specialCharacters.Length == 0)
        return str;

    if (str.IndexOfAny(specialCharacters) == 0)
        return str;

    var sb = new StringBuilder(str.Length);
    foreach (char c in str)
    {
        if (!specialCharacters.Contains(c))
            sb.Append(c);
        else if (replaceChar.HasValue)
            sb.Append(replaceChar.Value);
    }
    return sb.ToString();         
}

I tried also

return new string(str.Except(specialCharacters).ToArray());

but it created strange behavior, where duplicate are ignored and further issue. For instance, "Bla-ID" became "BlaI" when specifying - as single special char.

Upvotes: 0

Ezh
Ezh

Reputation: 619

There are no valid answers in this topic yet. Author said: "...I want to use as a filename...". Remove/replace invalid characters is not enough to use something as filename. You should at least check that:

  1. You don't already have file with such name in a folder, where you want to create a new one
  2. Total path to file (path to folder + filename + extension) is not more than MAX_PATH (260 symbols). Yes, there are tricks to hack this on latest Windows, but if you want your app to work fine - you should check it
  3. You don't use any special filenames (see answer by @Phil Price)

Probably the best way would be to:

  1. Remove bad characters using one of the other answers here.
  2. Make sure total path is less than 260 characters (if not - remove last N chars)
  3. Make sure file with given filename doesn't exist (if it does - replace last N chars until find available filename)
  4. Make sure you don't use any reserved filenames (if you do - replace last N chars until find proper and available filename)

As always, things are more complicated, then they look. Better to use some already existing function, like GetTempFileNameW

Upvotes: 1

Qwertie
Qwertie

Reputation: 17176

In case anyone wants an optimized version based on StringBuilder, use this. Includes rkagerer's trick as an option.

static char[] _invalids;

/// <summary>Replaces characters in <c>text</c> that are not allowed in 
/// file names with the specified replacement character.</summary>
/// <param name="text">Text to make into a valid filename. The same string is returned if it is valid already.</param>
/// <param name="replacement">Replacement character, or null to simply remove bad characters.</param>
/// <param name="fancy">Whether to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
/// <returns>A string that can be used as a filename. If the output string would otherwise be empty, returns "_".</returns>
public static string MakeValidFileName(string text, char? replacement = '_', bool fancy = true)
{
    StringBuilder sb = new StringBuilder(text.Length);
    var invalids = _invalids ?? (_invalids = Path.GetInvalidFileNameChars());
    bool changed = false;
    for (int i = 0; i < text.Length; i++) {
        char c = text[i];
        if (invalids.Contains(c)) {
            changed = true;
            var repl = replacement ?? '\0';
            if (fancy) {
                if (c == '"')       repl = '”'; // U+201D right double quotation mark
                else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                else if (c == '/')  repl = '⁄'; // U+2044 fraction slash
            }
            if (repl != '\0')
                sb.Append(repl);
        } else
            sb.Append(c);
    }
    if (sb.Length == 0)
        return "_";
    return changed ? sb.ToString() : text;
}

Upvotes: 16

Moch Yusup
Moch Yusup

Reputation: 1346

A simple one line code:

var validFileName = Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));

You can wrap it in an extension method if you want to reuse it.

public static string ToValidFileName(this string fileName) => Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));

Upvotes: 12

Joseph Gabriel
Joseph Gabriel

Reputation: 8510

This isn't more efficient, but it's more fun :)

var fileName = "foo:bar";
var invalidChars = System.IO.Path.GetInvalidFileNameChars();
var cleanFileName = new string(fileName.Where(m => !invalidChars.Contains(m)).ToArray<char>());

Upvotes: 20

mheyman
mheyman

Reputation: 4325

I needed a system that couldn't create collisions so I couldn't map multiple characters to one. I ended up with:

public static class Extension
{
    /// <summary>
    /// Characters allowed in a file name. Note that curly braces don't show up here
    /// becausee they are used for escaping invalid characters.
    /// </summary>
    private static readonly HashSet<char> CleanFileNameChars = new HashSet<char>
    {
        ' ', '!', '#', '$', '%', '&', '\'', '(', ')', '+', ',', '-', '.',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', '@',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
        '[', ']', '^', '_', '`',
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
    };

    /// <summary>
    /// Creates a clean file name from one that may contain invalid characters in 
    /// a way that will not collide.
    /// </summary>
    /// <param name="dirtyFileName">
    /// The file name that may contain invalid filename characters.
    /// </param>
    /// <returns>
    /// A file name that does not contain invalid filename characters.
    /// </returns>
    /// <remarks>
    /// <para>
    /// Escapes invalid characters by converting their ASCII values to hexadecimal
    /// and wrapping that value in curly braces. Curly braces are escaped by doubling
    /// them, for example '{' => "{{".
    /// </para>
    /// <para>
    /// Note that although NTFS allows unicode characters in file names, this
    /// method does not.
    /// </para>
    /// </remarks>
    public static string CleanFileName(this string dirtyFileName)
    {
        string EscapeHexString(char c) =>
            "{" + (c > 255 ? $"{(uint)c:X4}" : $"{(uint)c:X2}") + "}";

        return string.Join(string.Empty,
                           dirtyFileName.Select(
                               c =>
                                   c == '{' ? "{{" :
                                   c == '}' ? "}}" :
                                   CleanFileNameChars.Contains(c) ? $"{c}" :
                                   EscapeHexString(c)));
    }
}

Upvotes: 1

GDemartini
GDemartini

Reputation: 351

Another simple solution:

private string MakeValidFileName(string original, char replacementChar = '_')
{
  var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
  return new string(original.Select(c => invalidChars.Contains(c) ? replacementChar : c).ToArray());
}

Upvotes: 6

DavidG
DavidG

Reputation: 118937

Here's a version of the accepted answer using Linq which uses Enumerable.Aggregate:

string fileName = "something";

Path.GetInvalidFileNameChars()
    .Aggregate(fileName, (current, c) => current.Replace(c, '_'));

Upvotes: 11

jnm2
jnm2

Reputation: 8354

Here's a version that uses StringBuilder and IndexOfAny with bulk append for full efficiency. It also returns the original string rather than create a duplicate string.

Last but not least, it has a switch statement that returns look-alike characters which you can customize any way you wish. Check out Unicode.org's confusables lookup to see what options you might have, depending on the font.

public static string GetSafeFilename(string arbitraryString)
{
    var invalidChars = System.IO.Path.GetInvalidFileNameChars();
    var replaceIndex = arbitraryString.IndexOfAny(invalidChars, 0);
    if (replaceIndex == -1) return arbitraryString;

    var r = new StringBuilder();
    var i = 0;

    do
    {
        r.Append(arbitraryString, i, replaceIndex - i);

        switch (arbitraryString[replaceIndex])
        {
            case '"':
                r.Append("''");
                break;
            case '<':
                r.Append('\u02c2'); // '˂' (modifier letter left arrowhead)
                break;
            case '>':
                r.Append('\u02c3'); // '˃' (modifier letter right arrowhead)
                break;
            case '|':
                r.Append('\u2223'); // '∣' (divides)
                break;
            case ':':
                r.Append('-');
                break;
            case '*':
                r.Append('\u2217'); // '∗' (asterisk operator)
                break;
            case '\\':
            case '/':
                r.Append('\u2044'); // '⁄' (fraction slash)
                break;
            case '\0':
            case '\f':
            case '?':
                break;
            case '\t':
            case '\n':
            case '\r':
            case '\v':
                r.Append(' ');
                break;
            default:
                r.Append('_');
                break;
        }

        i = replaceIndex + 1;
        replaceIndex = arbitraryString.IndexOfAny(invalidChars, i);
    } while (replaceIndex != -1);

    r.Append(arbitraryString, i, arbitraryString.Length - i);

    return r.ToString();
}

It doesn't check for ., .., or reserved names like CON because it isn't clear what the replacement should be.

Upvotes: 5

rkagerer
rkagerer

Reputation: 4284

Here's a slight twist on Diego's answer.

If you're not afraid of Unicode, you can retain a bit more fidelity by replacing the invalid characters with valid Unicode symbols that resemble them. Here's the code I used in a recent project involving lumber cutlists:

static string MakeValidFilename(string text) {
  text = text.Replace('\'', '’'); // U+2019 right single quotation mark
  text = text.Replace('"',  '”'); // U+201D right double quotation mark
  text = text.Replace('/', '⁄');  // U+2044 fraction slash
  foreach (char c in System.IO.Path.GetInvalidFileNameChars()) {
    text = text.Replace(c, '_');
  }
  return text;
}

This produces filenames like 1⁄2” spruce.txt instead of 1_2_ spruce.txt

Yes, it really works:

Explorer sample

Caveat Emptor

I knew this trick would work on NTFS but was surprised to find it also works on FAT and FAT32 partitions. That's because long filenames are stored in Unicode, even as far back as Windows 95/NT. I tested on Win7, XP, and even a Linux-based router and they showed up OK. Can't say the same for inside a DOSBox.

That said, before you go nuts with this, consider whether you really need the extra fidelity. The Unicode look-alikes could confuse people or old programs, e.g. older OS's relying on codepages.

Upvotes: 9

Joan Vilari&#241;o
Joan Vilari&#241;o

Reputation: 143

Cleaning a little my code and making a little refactoring... I created an extension for string type:

public static string ToValidFileName(this string s, char replaceChar = '_', char[] includeChars = null)
{
  var invalid = Path.GetInvalidFileNameChars();
  if (includeChars != null) invalid = invalid.Union(includeChars).ToArray();
  return string.Join(string.Empty, s.ToCharArray().Select(o => o.In(invalid) ? replaceChar : o));
}

Now it's easier to use with:

var name = "Any string you want using ? / \ or even +.zip";
var validFileName = name.ToValidFileName();

If you want to replace with a different char than "_" you can use:

var validFileName = name.ToValidFileName(replaceChar:'#');

And you can add chars to replace.. for example you dont want spaces or commas:

var validFileName = name.ToValidFileName(includeChars: new [] { ' ', ',' });

Hope it helps...

Cheers

Upvotes: 3

Joan Vilari&#241;o
Joan Vilari&#241;o

Reputation: 143

I needed to do this today... in my case, I needed to concatenate a customer name with the date and time for a final .kmz file. My final solution was this:

 string name = "Whatever name with valid/invalid chars";
 char[] invalid = System.IO.Path.GetInvalidFileNameChars();
 string validFileName = string.Join(string.Empty,
                            string.Format("{0}.{1:G}.kmz", name, DateTime.Now)
                            .ToCharArray().Select(o => o.In(invalid) ? '_' : o));

You can even make it replace spaces if you add the space char to the invalid array.

Maybe it's not the fastest, but as performance wasn't an issue, I found it elegant and understandable.

Cheers!

Upvotes: 0

Diego Jancic
Diego Jancic

Reputation: 7440

Try something like this:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}

Edit:

Since GetInvalidFileNameChars() will return 10 or 15 chars, it's better to use a StringBuilder instead of a simple string; the original version will take longer and consume more memory.

Upvotes: 181

D W
D W

Reputation: 3079

You can do this with a sed command:

 sed -e "
 s/[?()\[\]=+<>:;©®”,*|]/_/g
 s/"$'\t'"/ /g
 s/–/-/g
 s/\"/_/g
 s/[[:cntrl:]]/_/g"

Upvotes: -2

leggetter
leggetter

Reputation: 15467

Diego does have the correct solution but there is one very small mistake in there. The version of string.Replace being used should be string.Replace(char, char), there isn't a string.Replace(char, string)

I can't edit the answer or I would have just made the minor change.

So it should be:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}

Upvotes: 8

Phil Price
Phil Price

Reputation: 2313

fileName = fileName.Replace(":", "-") 

However ":" is not the only illegal character for Windows. You will also have to handle:

/, \, :, *, ?, ", <, > and |

These are contained in System.IO.Path.GetInvalidFileNameChars();

Also (on Windows), "." cannot be the only character in the filename (both ".", "..", "...", and so on are invalid). Be careful when naming files with ".", for example:

echo "test" > .test.

Will generate a file named ".test"

Lastly, if you really want to do things correctly, there are some special file names you need to look out for. On Windows you can't create files named:

CON, PRN, AUX, CLOCK$, NUL
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.

Upvotes: 39

Related Questions