Chris Mullins
Chris Mullins

Reputation: 6867

How to check for a valid Base64 encoded string

Is there a way in C# to see if a string is Base 64 encoded other than just trying to convert it and see if there is an error? I have code code like this:

// Convert base64-encoded hash value into a byte array.
byte[] HashBytes = Convert.FromBase64String(Value);

I want to avoid the "Invalid character in a Base-64 string" exception that happens if the value is not valid base 64 string. I want to just check and return false instead of handling an exception because I expect that sometimes this value is not going to be a base 64 string. Is there some way to check before using the Convert.FromBase64String function?

Upvotes: 187

Views: 249569

Answers (22)

newky2k
newky2k

Reputation: 479

I know this is an old issue but i've just recently stumbled on to the subject and have the same requirements.

According to Microsoft's documentation there is a now a method on Base64, Base64.IsValid, that will correctly validate if a string(or byte array) is base64 encoded.

This was added in .NET 8.0 and is not available in .NET Standard

https://learn.microsoft.com/en-us/dotnet/api/system.buffers.text.base64.isvalid?view=net-9.0

Validates that the specified span of text is comprised of valid base-64 encoded data.

true if base64Text contains a valid, decodable sequence of base-64 encoded data; otherwise, false.

In my limited testing, to satisfy my requirements, it returns false for plain text and true for an encoded string.

Upvotes: 0

khellang
khellang

Reputation: 18102

I'm quite surprised that no one has mentioned System.Buffers.Text.Base64 which was introduced in .NET Core 2.1 (and is part of .NET Standard 2.0).

It has an IsValid method to check whether a ReadOnlySpan<char> or a ReadOnlySpan<byte> is valid Base 64. Since string is implicitly convertible to ReadOnlySpan<char>, you can simply pass in a string as well. Its XML docs states

If the method returns true, the same text passed to Convert.FromBase64String(string) and Convert.TryFromBase64Chars would successfully decode. Any amount of whitespace is allowed anywhere in the input, where whitespace is defined as the characters ' ', '\t', '\r', or '\n'.

Using this method has several advantages over all existing answers provided here:

  • It is correct and maintained by Microsoft
  • It is highly optimized, using new primitives like SearchValues<T> and vectorization
  • It does not allocate (potentially huge byte arrays)
  • It does not rely on catching exceptions, which is expensive
  • It does not use RegEx, which is expensive (especially for large strings)
  • It has an overload that returns back the decoded number of bytes as an out parameter in the same operation

Upvotes: 1

Tomas Kubes
Tomas Kubes

Reputation: 25098

Use Convert.TryFromBase64String from C# 7.2 (.NET Core 2.1+ or .NET Standard 2. and higher).

public static bool IsBase64String(string base64)
{
   Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
   return Convert.TryFromBase64String(base64, buffer , out int bytesParsed);
}

Upvotes: 148

Anirudh Ramanathan
Anirudh Ramanathan

Reputation: 46728

Update: For newer versions of C#, there's a much better alternative, please refer to the answer by Tomas here: https://stackoverflow.com/a/54143400/125981.


It's pretty easy to recognize a Base64 string, as it will only be composed of characters 'A'..'Z', 'a'..'z', '0'..'9', '+', '/' and it is often padded at the end with up to three '=', to make the length a multiple of 4. But instead of comparing these, you'd be better off ignoring the exception, if it occurs.

Upvotes: 56

Roland
Roland

Reputation: 5226

The answer must depend on the usage of the string. There are many strings that may be "valid base64" according to the syntax suggested by several posters, but that may "correctly" decode, without exception, to junk. Example: the 8char string Portland is valid Base64. What is the point of stating that this is valid Base64? I guess that at some point you'd want to know that this string should or should not be Base64 decoded.

In my case, I am reading Oracle connection strings from file app.config that may be either in plain text like:

Data source=mydb/DBNAME;User Id=Roland;Password=secret1;

or in base64 like

VXNlciBJZD1sa.....................................==

(my predecessor considered base64 as encryption :-)

In order to decide if base64 decoding is needed, in this particular use case, I should simply check if the string starts with "Data" (case insensitive). This is much easier, faster, and more reliable, than just try to decode, and see if an exception occurs:

if (ConnectionString.Substring(0, 4).ToLower() != "data")
{
  //..DecodeBase64..
}

I updated this answer; my old conclusion was:

I just have to check for the presence of a semicolon, because that proves that it is NOT base64, which is of course faster than any above method.

Upvotes: 6

Dimitri Troncquo
Dimitri Troncquo

Reputation: 451

I just wanted to point out that none of the answers to date are very useable (depending on your use-case, but bare with me).

All of them will return false positives for strings of a length divisible by 4, not containing whitespace. If you adjust for missing padding, all strings within the [aA-zZ0-9]+ range will register as base64 encoded.

It doesn't matter if you check for valid characters and length, or use the Exception or TryConvert approach, all these methods return false positives.

Some simple examples:

  • "test" will register as base64 encoded
  • "test1" will register as base64 encoded if you adjust for missing padding (trailing '=')
  • "test test" will never register as base64 encoded
  • "tést" will never register as base64 encoded

I'm not saying the methods described here are useless, but you should be aware of the limitations before you use any of these in a production environment.

Upvotes: 4

Sorry IwontTell
Sorry IwontTell

Reputation: 502

All answers were been digested into 1 function that ensures 100% that its results will be accurate.

1) Use function as below:

string encoded = "WW91ckJhc2U2NHN0cmluZw==";
Console.WriteLine("Is string base64=" + IsBase64(encoded));

2) Below is the function:

public bool IsBase64(string base64String)
{
    try
    {
        if (!base64String.Equals(Convert.ToBase64String(Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(Convert.FromBase64String(base64String)))), StringComparison.InvariantCultureIgnoreCase) & !System.Text.RegularExpressions.Regex.IsMatch(base64String, @"^[a-zA-Z0-9\+/]*={0,2}$"))
        {
            return false;
        }
        else if ((base64String.Length % 4) != 0 || string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0 || base64String.Contains(" ") || base64String.Contains(Constants.vbTab) || base64String.Contains(Constants.vbCr) || base64String.Contains(Constants.vbLf))
        {
            return false;
        }
        else return true;
    }
    catch (FormatException ex)
    {
        return false;
    }
}

Upvotes: 1

Navdeep Kapil
Navdeep Kapil

Reputation: 381

Check Base64 or normal string

public bool IsBase64Encoded(String str)
{

 try

  {
    // If no exception is caught, then it is possibly a base64 encoded string
    byte[] data = Convert.FromBase64String(str);
    // The part that checks if the string was properly padded to the
    // correct length was borrowed from d@anish's solution
    return (str.Replace(" ","").Length % 4 == 0);
  }
catch
  {
    // If exception is caught, then it is not a base64 encoded string
   return false;
  }

}

Upvotes: 0

Scholtz
Scholtz

Reputation: 3736

I prefer this usage:

    public static class StringExtensions
    {
        /// <summary>
        /// Check if string is Base64
        /// </summary>
        /// <param name="base64"></param>
        /// <returns></returns>
        public static bool IsBase64String(this string base64)
        {
            //https://stackoverflow.com/questions/6309379/how-to-check-for-a-valid-base64-encoded-string
            Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
            return Convert.TryFromBase64String(base64, buffer, out int _);
        }
    }

Then usage

if(myStr.IsBase64String()){

    ...

}

Upvotes: 4

PKOS
PKOS

Reputation: 71

Do decode, re encode and compare the result to original string

public static Boolean IsBase64(this String str)
{
    if ((str.Length % 4) != 0)
    {
        return false;
    }

    //decode - encode and compare
    try
    {
        string decoded = System.Text.Encoding.UTF8.GetString(System.Convert.FromBase64String(str));
        string encoded = System.Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(decoded));
        if (str.Equals(encoded, StringComparison.InvariantCultureIgnoreCase))
        {
            return true;
        }
    }
    catch { }
    return false;
}

Upvotes: 7

harsimranb
harsimranb

Reputation: 2283

I know you said you didn't want to catch an exception. But, because catching an exception is more reliable, I will go ahead and post this answer.

public static bool IsBase64(this string base64String) {
     // Credit: oybek https://stackoverflow.com/users/794764/oybek
     if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
        || base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
        return false;

     try{
         Convert.FromBase64String(base64String);
         return true;
     }
     catch(Exception exception){
     // Handle the exception
     }
     return false;
}

Update: I've updated the condition thanks to oybek to further improve reliability.

Upvotes: 53

JD Brennan
JD Brennan

Reputation: 1112

I believe the regex should be:

    Regex.IsMatch(s, @"^[a-zA-Z0-9\+/]*={0,2}$")

Only matching one or two trailing '=' signs, not three.

s should be the string that will be checked. Regex is part of the System.Text.RegularExpressions namespace.

Upvotes: 21

testing
testing

Reputation: 20279

Imho this is not really possible. All posted solutions fails for strings like "test" and so on. If they can be divided through 4, are not null or empty, and if they are a valid base64 character, they will pass all tests. That can be many strings ...

So there is no real solution other than knowing that this is a base 64 encoded string. What I've come up with is this:

if (base64DecodedString.StartsWith("<xml>")
{
    // This was really a base64 encoded string I was expecting. Yippie!
}
else
{
    // This is gibberish.
}

I expect that the decoded string begins with a certain structure, so I check for that.

Upvotes: 5

germankiwi
germankiwi

Reputation: 1132

I have just had a very similar requirement where I am letting the user do some image manipulation in a <canvas> element and then sending the resulting image retrieved with .toDataURL() to the backend. I wanted to do some server validation before saving the image and have implemented a ValidationAttribute using some of the code from other answers:

[AttributeUsage(AttributeTargets.Property, AllowMultiple = false, Inherited = false)]
public class Bae64PngImageAttribute : ValidationAttribute
{
    public override bool IsValid(object value)
    {
        if (value == null || string.IsNullOrWhiteSpace(value as string))
            return true; // not concerned with whether or not this field is required
        var base64string = (value as string).Trim();

        // we are expecting a URL type string
        if (!base64string.StartsWith("data:image/png;base64,"))
            return false;

        base64string = base64string.Substring("data:image/png;base64,".Length);

        // match length and regular expression
        if (base64string.Length % 4 != 0 || !Regex.IsMatch(base64string, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None))
            return false;

        // finally, try to convert it to a byte array and catch exceptions
        try
        {
            byte[] converted = Convert.FromBase64String(base64string);
            return true;
        }
        catch(Exception)
        {
            return false;
        }
    }
}

As you can see I am expecting an image/png type string, which is the default returned by <canvas> when using .toDataURL().

Upvotes: 0

Yaseer Arafat
Yaseer Arafat

Reputation: 81

I will use like this so that I don't need to call the convert method again

   public static bool IsBase64(this string base64String,out byte[] bytes)
    {
        bytes = null;
        // Credit: oybek http://stackoverflow.com/users/794764/oybek
        if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
           || base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
            return false;

        try
        {
             bytes=Convert.FromBase64String(base64String);
            return true;
        }
        catch (Exception)
        {
            // Handle the exception
        }

        return false;
    }

Upvotes: 3

user3181503
user3181503

Reputation: 19

public static bool IsBase64String1(string value)
        {
            if (string.IsNullOrEmpty(value))
            {
                return false;
            }
            try
            {
                Convert.FromBase64String(value);
                if (value.EndsWith("="))
                {
                    value = value.Trim();
                    int mod4 = value.Length % 4;
                    if (mod4 != 0)
                    {
                        return false;
                    }
                    return true;
                }
                else
                {

                    return false;
                }
            }
            catch (FormatException)
            {
                return false;
            }
        }

Upvotes: 1

Oybek
Oybek

Reputation: 7243

Just for the sake of completeness I want to provide some implementation. Generally speaking Regex is an expensive approach, especially if the string is large (which happens when transferring large files). The following approach tries the fastest ways of detection first.

public static class HelperExtensions {
    // Characters that are used in base64 strings.
    private static Char[] Base64Chars = new[] { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/' };
    /// <summary>
    /// Extension method to test whether the value is a base64 string
    /// </summary>
    /// <param name="value">Value to test</param>
    /// <returns>Boolean value, true if the string is base64, otherwise false</returns>
    public static Boolean IsBase64String(this String value) {

        // The quickest test. If the value is null or is equal to 0 it is not base64
        // Base64 string's length is always divisible by four, i.e. 8, 16, 20 etc. 
        // If it is not you can return false. Quite effective
        // Further, if it meets the above criterias, then test for spaces.
        // If it contains spaces, it is not base64
        if (value == null || value.Length == 0 || value.Length % 4 != 0
            || value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
            return false;

        // 98% of all non base64 values are invalidated by this time.
        var index = value.Length - 1;

        // if there is padding step back
        if (value[index] == '=')
            index--;

        // if there are two padding chars step back a second time
        if (value[index] == '=')
            index--;

        // Now traverse over characters
        // You should note that I'm not creating any copy of the existing strings, 
        // assuming that they may be quite large
        for (var i = 0; i <= index; i++) 
            // If any of the character is not from the allowed list
            if (!Base64Chars.Contains(value[i]))
                // return false
                return false;

        // If we got here, then the value is a valid base64 string
        return true;
    }
}

EDIT

As suggested by Sam, you can also change the source code slightly. He provides a better performing approach for the last step of tests. The routine

    private static Boolean IsInvalid(char value) {
        var intValue = (Int32)value;

        // 1 - 9
        if (intValue >= 48 && intValue <= 57) 
            return false;

        // A - Z
        if (intValue >= 65 && intValue <= 90) 
            return false;

        // a - z
        if (intValue >= 97 && intValue <= 122) 
            return false;

        // + or /
        return intValue != 43 && intValue != 47;
    } 

can be used to replace if (!Base64Chars.Contains(value[i])) line with if (IsInvalid(value[i]))

The complete source code with enhancements from Sam will look like this (removed comments for clarity)

public static class HelperExtensions {
    public static Boolean IsBase64String(this String value) {
        if (value == null || value.Length == 0 || value.Length % 4 != 0
            || value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
            return false;
        var index = value.Length - 1;
        if (value[index] == '=')
            index--;
        if (value[index] == '=')
            index--;
        for (var i = 0; i <= index; i++)
            if (IsInvalid(value[i]))
                return false;
        return true;
    }
    // Make it private as there is the name makes no sense for an outside caller
    private static Boolean IsInvalid(char value) {
        var intValue = (Int32)value;
        if (intValue >= 48 && intValue <= 57)
            return false;
        if (intValue >= 65 && intValue <= 90)
            return false;
        if (intValue >= 97 && intValue <= 122)
            return false;
        return intValue != 43 && intValue != 47;
    }
}

Upvotes: 10

Jason K
Jason K

Reputation: 155

Knibb High football rules!

This should be relatively fast and accurate but I admit I didn't put it through a thorough test, just a few.

It avoids expensive exceptions, regex, and also avoids looping through a character set, instead using ascii ranges for validation.

public static bool IsBase64String(string s)
    {
        s = s.Trim();
        int mod4 = s.Length % 4;
        if(mod4!=0){
            return false;
        }
        int i=0;
        bool checkPadding = false;
        int paddingCount = 1;//only applies when the first is encountered.
        for(i=0;i<s.Length;i++){
            char c = s[i];
            if (checkPadding)
            {
                if (c != '=')
                {
                    return false;
                }
                paddingCount++;
                if (paddingCount > 3)
                {
                    return false;
                }
                continue;
            }
            if(c>='A' && c<='z' || c>='0' && c<='9'){
                continue;
            }
            switch(c){ 
              case '+':
              case '/':
                 continue;
              case '=': 
                 checkPadding = true;
                 continue;
            }
            return false;
        }
        //if here
        //, length was correct
        //, there were no invalid characters
        //, padding was correct
        return true;
    }

Upvotes: 2

Tyler Eaves
Tyler Eaves

Reputation: 13121

Why not just catch the exception, and return False?

This avoids additional overhead in the common case.

Upvotes: 6

Jay
Jay

Reputation: 6294

I would suggest creating a regex to do the job. You'll have to check for something like this: [a-zA-Z0-9+/=] You'll also have to check the length of the string. I'm not sure on this one, but i'm pretty sure if something gets trimmed (other than the padding "=") it would blow up.

Or better yet check out this stackoverflow question

Upvotes: 0

Rob Raisch
Rob Raisch

Reputation: 17357

Yes, since Base64 encodes binary data into ASCII strings using a limited set of characters, you can simply check it with this regular expression:

/^[A-Za-z0-9\=\+\/\s\n]+$/s

which will assure the string only contains A-Z, a-z, 0-9, '+', '/', '=', and whitespace.

Upvotes: 0

user684934
user684934

Reputation:

Sure. Just make sure each character is within a-z, A-Z, 0-9, /, or +, and the string ends with ==. (At least, that's the most common Base64 implementation. You might find some implementations that use characters different from / or + for the last two characters.)

Upvotes: 0

Related Questions