Reputation: 6867
Is there a way in C# to see if a string is Base 64 encoded other than just trying to convert it and see if there is an error? I have code code like this:
// Convert base64-encoded hash value into a byte array.
byte[] HashBytes = Convert.FromBase64String(Value);
I want to avoid the "Invalid character in a Base-64 string" exception that happens if the value is not valid base 64 string. I want to just check and return false instead of handling an exception because I expect that sometimes this value is not going to be a base 64 string. Is there some way to check before using the Convert.FromBase64String function?
Upvotes: 187
Views: 249569
Reputation: 479
I know this is an old issue but i've just recently stumbled on to the subject and have the same requirements.
According to Microsoft's documentation there is a now a method on Base64
, Base64.IsValid
, that will correctly validate if a string(or byte array) is base64 encoded.
This was added in .NET 8.0 and is not available in .NET Standard
https://learn.microsoft.com/en-us/dotnet/api/system.buffers.text.base64.isvalid?view=net-9.0
Validates that the specified span of text is comprised of valid base-64 encoded data.
true if base64Text contains a valid, decodable sequence of base-64 encoded data; otherwise, false.
In my limited testing, to satisfy my requirements, it returns false for plain text and true for an encoded string.
Upvotes: 0
Reputation: 18102
I'm quite surprised that no one has mentioned System.Buffers.Text.Base64
which was introduced in .NET Core 2.1 (and is part of .NET Standard 2.0).
It has an IsValid
method to check whether a ReadOnlySpan<char>
or a ReadOnlySpan<byte>
is valid Base 64. Since string
is implicitly convertible to ReadOnlySpan<char>
, you can simply pass in a string
as well. Its XML docs states
If the method returns
true
, the same text passed toConvert.FromBase64String(string)
andConvert.TryFromBase64Chars
would successfully decode. Any amount of whitespace is allowed anywhere in the input, where whitespace is defined as the characters ' ', '\t', '\r', or '\n'.
Using this method has several advantages over all existing answers provided here:
SearchValues<T>
and vectorizationUpvotes: 1
Reputation: 25098
Use Convert.TryFromBase64String from C# 7.2 (.NET Core 2.1+ or .NET Standard 2. and higher).
public static bool IsBase64String(string base64)
{
Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
return Convert.TryFromBase64String(base64, buffer , out int bytesParsed);
}
Upvotes: 148
Reputation: 46728
Update: For newer versions of C#, there's a much better alternative, please refer to the answer by Tomas here: https://stackoverflow.com/a/54143400/125981.
It's pretty easy to recognize a Base64 string, as it will only be composed of characters 'A'..'Z', 'a'..'z', '0'..'9', '+', '/'
and it is often padded at the end with up to three '=', to make the length a multiple of 4. But instead of comparing these, you'd be better off ignoring the exception, if it occurs.
Upvotes: 56
Reputation: 5226
The answer must depend on the usage of the string. There are many strings that may be "valid base64" according to the syntax suggested by several posters, but that may "correctly" decode, without exception, to junk. Example: the 8char string Portland
is valid Base64. What is the point of stating that this is valid Base64? I guess that at some point you'd want to know that this string should or should not be Base64 decoded.
In my case, I am reading Oracle connection strings from file app.config that may be either in plain text like:
Data source=mydb/DBNAME;User Id=Roland;Password=secret1;
or in base64 like
VXNlciBJZD1sa.....................................==
(my predecessor considered base64 as encryption :-)
In order to decide if base64 decoding is needed, in this particular use case, I should simply check if the string starts with "Data" (case insensitive). This is much easier, faster, and more reliable, than just try to decode, and see if an exception occurs:
if (ConnectionString.Substring(0, 4).ToLower() != "data")
{
//..DecodeBase64..
}
I updated this answer; my old conclusion was:
I just have to check for the presence of a semicolon, because that proves that it is NOT base64, which is of course faster than any above method.
Upvotes: 6
Reputation: 451
I just wanted to point out that none of the answers to date are very useable (depending on your use-case, but bare with me).
All of them will return false positives for strings of a length divisible by 4, not containing whitespace. If you adjust for missing padding, all strings within the [aA-zZ0-9]+ range will register as base64 encoded.
It doesn't matter if you check for valid characters and length, or use the Exception or TryConvert approach, all these methods return false positives.
Some simple examples:
"test"
will register as base64 encoded"test1"
will register as base64 encoded if you adjust for missing padding (trailing '=')"test test"
will never register as base64 encoded"tést"
will never register as base64 encodedI'm not saying the methods described here are useless, but you should be aware of the limitations before you use any of these in a production environment.
Upvotes: 4
Reputation: 502
All answers were been digested into 1 function that ensures 100% that its results will be accurate.
1) Use function as below:
string encoded = "WW91ckJhc2U2NHN0cmluZw==";
Console.WriteLine("Is string base64=" + IsBase64(encoded));
2) Below is the function:
public bool IsBase64(string base64String)
{
try
{
if (!base64String.Equals(Convert.ToBase64String(Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(Convert.FromBase64String(base64String)))), StringComparison.InvariantCultureIgnoreCase) & !System.Text.RegularExpressions.Regex.IsMatch(base64String, @"^[a-zA-Z0-9\+/]*={0,2}$"))
{
return false;
}
else if ((base64String.Length % 4) != 0 || string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0 || base64String.Contains(" ") || base64String.Contains(Constants.vbTab) || base64String.Contains(Constants.vbCr) || base64String.Contains(Constants.vbLf))
{
return false;
}
else return true;
}
catch (FormatException ex)
{
return false;
}
}
Upvotes: 1
Reputation: 381
Check Base64 or normal string
public bool IsBase64Encoded(String str)
{
try
{
// If no exception is caught, then it is possibly a base64 encoded string
byte[] data = Convert.FromBase64String(str);
// The part that checks if the string was properly padded to the
// correct length was borrowed from d@anish's solution
return (str.Replace(" ","").Length % 4 == 0);
}
catch
{
// If exception is caught, then it is not a base64 encoded string
return false;
}
}
Upvotes: 0
Reputation: 3736
I prefer this usage:
public static class StringExtensions
{
/// <summary>
/// Check if string is Base64
/// </summary>
/// <param name="base64"></param>
/// <returns></returns>
public static bool IsBase64String(this string base64)
{
//https://stackoverflow.com/questions/6309379/how-to-check-for-a-valid-base64-encoded-string
Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
return Convert.TryFromBase64String(base64, buffer, out int _);
}
}
Then usage
if(myStr.IsBase64String()){
...
}
Upvotes: 4
Reputation: 71
Do decode, re encode and compare the result to original string
public static Boolean IsBase64(this String str)
{
if ((str.Length % 4) != 0)
{
return false;
}
//decode - encode and compare
try
{
string decoded = System.Text.Encoding.UTF8.GetString(System.Convert.FromBase64String(str));
string encoded = System.Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(decoded));
if (str.Equals(encoded, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
}
catch { }
return false;
}
Upvotes: 7
Reputation: 2283
I know you said you didn't want to catch an exception. But, because catching an exception is more reliable, I will go ahead and post this answer.
public static bool IsBase64(this string base64String) {
// Credit: oybek https://stackoverflow.com/users/794764/oybek
if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
|| base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
return false;
try{
Convert.FromBase64String(base64String);
return true;
}
catch(Exception exception){
// Handle the exception
}
return false;
}
Update: I've updated the condition thanks to oybek to further improve reliability.
Upvotes: 53
Reputation: 1112
I believe the regex should be:
Regex.IsMatch(s, @"^[a-zA-Z0-9\+/]*={0,2}$")
Only matching one or two trailing '=' signs, not three.
s
should be the string that will be checked. Regex
is part of the System.Text.RegularExpressions
namespace.
Upvotes: 21
Reputation: 20279
Imho this is not really possible. All posted solutions fails for strings like "test" and so on. If they can be divided through 4, are not null or empty, and if they are a valid base64 character, they will pass all tests. That can be many strings ...
So there is no real solution other than knowing that this is a base 64 encoded string. What I've come up with is this:
if (base64DecodedString.StartsWith("<xml>")
{
// This was really a base64 encoded string I was expecting. Yippie!
}
else
{
// This is gibberish.
}
I expect that the decoded string begins with a certain structure, so I check for that.
Upvotes: 5
Reputation: 1132
I have just had a very similar requirement where I am letting the user do some image manipulation in a <canvas>
element and then sending the resulting image retrieved with .toDataURL()
to the backend. I wanted to do some server validation before saving the image and have implemented a ValidationAttribute
using some of the code from other answers:
[AttributeUsage(AttributeTargets.Property, AllowMultiple = false, Inherited = false)]
public class Bae64PngImageAttribute : ValidationAttribute
{
public override bool IsValid(object value)
{
if (value == null || string.IsNullOrWhiteSpace(value as string))
return true; // not concerned with whether or not this field is required
var base64string = (value as string).Trim();
// we are expecting a URL type string
if (!base64string.StartsWith("data:image/png;base64,"))
return false;
base64string = base64string.Substring("data:image/png;base64,".Length);
// match length and regular expression
if (base64string.Length % 4 != 0 || !Regex.IsMatch(base64string, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None))
return false;
// finally, try to convert it to a byte array and catch exceptions
try
{
byte[] converted = Convert.FromBase64String(base64string);
return true;
}
catch(Exception)
{
return false;
}
}
}
As you can see I am expecting an image/png type string, which is the default returned by <canvas>
when using .toDataURL()
.
Upvotes: 0
Reputation: 81
I will use like this so that I don't need to call the convert method again
public static bool IsBase64(this string base64String,out byte[] bytes)
{
bytes = null;
// Credit: oybek http://stackoverflow.com/users/794764/oybek
if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
|| base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
return false;
try
{
bytes=Convert.FromBase64String(base64String);
return true;
}
catch (Exception)
{
// Handle the exception
}
return false;
}
Upvotes: 3
Reputation: 19
public static bool IsBase64String1(string value)
{
if (string.IsNullOrEmpty(value))
{
return false;
}
try
{
Convert.FromBase64String(value);
if (value.EndsWith("="))
{
value = value.Trim();
int mod4 = value.Length % 4;
if (mod4 != 0)
{
return false;
}
return true;
}
else
{
return false;
}
}
catch (FormatException)
{
return false;
}
}
Upvotes: 1
Reputation: 7243
Just for the sake of completeness I want to provide some implementation. Generally speaking Regex is an expensive approach, especially if the string is large (which happens when transferring large files). The following approach tries the fastest ways of detection first.
public static class HelperExtensions {
// Characters that are used in base64 strings.
private static Char[] Base64Chars = new[] { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/' };
/// <summary>
/// Extension method to test whether the value is a base64 string
/// </summary>
/// <param name="value">Value to test</param>
/// <returns>Boolean value, true if the string is base64, otherwise false</returns>
public static Boolean IsBase64String(this String value) {
// The quickest test. If the value is null or is equal to 0 it is not base64
// Base64 string's length is always divisible by four, i.e. 8, 16, 20 etc.
// If it is not you can return false. Quite effective
// Further, if it meets the above criterias, then test for spaces.
// If it contains spaces, it is not base64
if (value == null || value.Length == 0 || value.Length % 4 != 0
|| value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
return false;
// 98% of all non base64 values are invalidated by this time.
var index = value.Length - 1;
// if there is padding step back
if (value[index] == '=')
index--;
// if there are two padding chars step back a second time
if (value[index] == '=')
index--;
// Now traverse over characters
// You should note that I'm not creating any copy of the existing strings,
// assuming that they may be quite large
for (var i = 0; i <= index; i++)
// If any of the character is not from the allowed list
if (!Base64Chars.Contains(value[i]))
// return false
return false;
// If we got here, then the value is a valid base64 string
return true;
}
}
EDIT
As suggested by Sam, you can also change the source code slightly. He provides a better performing approach for the last step of tests. The routine
private static Boolean IsInvalid(char value) {
var intValue = (Int32)value;
// 1 - 9
if (intValue >= 48 && intValue <= 57)
return false;
// A - Z
if (intValue >= 65 && intValue <= 90)
return false;
// a - z
if (intValue >= 97 && intValue <= 122)
return false;
// + or /
return intValue != 43 && intValue != 47;
}
can be used to replace if (!Base64Chars.Contains(value[i]))
line with if (IsInvalid(value[i]))
The complete source code with enhancements from Sam will look like this (removed comments for clarity)
public static class HelperExtensions {
public static Boolean IsBase64String(this String value) {
if (value == null || value.Length == 0 || value.Length % 4 != 0
|| value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
return false;
var index = value.Length - 1;
if (value[index] == '=')
index--;
if (value[index] == '=')
index--;
for (var i = 0; i <= index; i++)
if (IsInvalid(value[i]))
return false;
return true;
}
// Make it private as there is the name makes no sense for an outside caller
private static Boolean IsInvalid(char value) {
var intValue = (Int32)value;
if (intValue >= 48 && intValue <= 57)
return false;
if (intValue >= 65 && intValue <= 90)
return false;
if (intValue >= 97 && intValue <= 122)
return false;
return intValue != 43 && intValue != 47;
}
}
Upvotes: 10
Reputation: 155
Knibb High football rules!
This should be relatively fast and accurate but I admit I didn't put it through a thorough test, just a few.
It avoids expensive exceptions, regex, and also avoids looping through a character set, instead using ascii ranges for validation.
public static bool IsBase64String(string s)
{
s = s.Trim();
int mod4 = s.Length % 4;
if(mod4!=0){
return false;
}
int i=0;
bool checkPadding = false;
int paddingCount = 1;//only applies when the first is encountered.
for(i=0;i<s.Length;i++){
char c = s[i];
if (checkPadding)
{
if (c != '=')
{
return false;
}
paddingCount++;
if (paddingCount > 3)
{
return false;
}
continue;
}
if(c>='A' && c<='z' || c>='0' && c<='9'){
continue;
}
switch(c){
case '+':
case '/':
continue;
case '=':
checkPadding = true;
continue;
}
return false;
}
//if here
//, length was correct
//, there were no invalid characters
//, padding was correct
return true;
}
Upvotes: 2
Reputation: 13121
Why not just catch the exception, and return False?
This avoids additional overhead in the common case.
Upvotes: 6
Reputation: 6294
I would suggest creating a regex to do the job. You'll have to check for something like this: [a-zA-Z0-9+/=] You'll also have to check the length of the string. I'm not sure on this one, but i'm pretty sure if something gets trimmed (other than the padding "=") it would blow up.
Or better yet check out this stackoverflow question
Upvotes: 0
Reputation: 17357
Yes, since Base64 encodes binary data into ASCII strings using a limited set of characters, you can simply check it with this regular expression:
/^[A-Za-z0-9\=\+\/\s\n]+$/s
which will assure the string only contains A-Z, a-z, 0-9, '+', '/', '=', and whitespace.
Upvotes: 0
Reputation:
Sure. Just make sure each character is within a-z
, A-Z
, 0-9
, /
, or +
, and the string ends with ==
. (At least, that's the most common Base64 implementation. You might find some implementations that use characters different from /
or +
for the last two characters.)
Upvotes: 0