Christopher
Christopher

Reputation: 2057

Strange behaviour of String.Format when (mis-)using placeholders

When I learned about the String.Format function, I did the mistake to think that it's acceptable to name the placeholders after the colon, so I wrote code like this:

String.Format("A message: '{0:message}'", "My message");
//output: "A message: 'My message'"

I just realized that the string behind the colon is used to define the format of the placeholder and may not be used to add a comment as I did.

But apparently, the string behind the colon is used for the placeholder if:

  1. I want to fill the placeholder with an integer and
  2. I use an unrecognized formating-string behind the colon

But this doesn't explain to me, why the string behind the colon is used for the placeholder if I provide an integer.

Some examples:

//Works for strings
String.Format("My number is {0:number}!", "10")
//output: "My number is 10!"

//Works without formating-string
String.Format("My number is {0}!", 10)
//output: "My number is 10!"

//Works with recognized formating string
String.Format("My number is {0:d}!", 10)
//output: "My number is 10!"

//Does not work with unrecognized formating string
String.Format("My number is {0:number}!", 10)
//output: "My number is number!"

Why is there a difference between the handling of strings and integers? And why is the fallback to output the formating string instead of the given value?

Upvotes: 0

Views: 1700

Answers (4)

Steve
Steve

Reputation: 216313

Interesting behavior indeed BUT NOT unaccounted for.
Your last example works when

if String.Format("My number is {0:n}!", 10)

but revert to the observed beahvior when

if String.Format("My number is {0:nu}!", 10)`. 

This prompts to search about the Standard Numeric Format Specifier article on MSDN where you can read

Standard numeric format strings are used to format common numeric types. A standard numeric format string takes the form Axx, where:

A is a single alphabetic character called the format specifier. Any numeric format string that contains more than one alphabetic character, including white space, is interpreted as a custom numeric format string. For more information, see Custom Numeric Format Strings.

The same article explains: if you have a SINGLE letter that is not recognized you get an exception. Indeed

if String.Format("My number is {0:K}!", 10)`. 

throws the FormatException as explained.

Now looking in the Custom Numeric Format Strings chapter you will find a table of eligible letters and their possible mixings, but at the end of the table you could read

Other
All other characters
The character is copied to the result string unchanged.

So I think that you have created a format string that cannot in any way print that number because there is no valid format specifier where the number 10 should be 'formatted'.

Upvotes: 1

Eugene Podskal
Eugene Podskal

Reputation: 10401

String.Format article on MSDN has following description:

A format item has this syntax: { index[,alignment][ :formatString] }

...

formatString Optional.

A string that specifies the format of the corresponding argument's result string. If you omit formatString, the corresponding argument's parameterless ToString method is called to produce its string representation. If you specify formatString, the argument referenced by the format item must implement the IFormattable interface.

If we directly format the value using the IFormattable we will have the same result:

String garbageFormatted = (10 as IFormattable).ToString("garbage in place of int",  
    CultureInfo.CurrentCulture.NumberFormat);

Console.WriteLine(garbageFormatted); // Writes the "garbage in place of int"

So it seems that it is something close to the "garbage in, garbage out" problem in the implementation of the IFormattable interface on Int32 type(and possibly on other types as well). The String class does not implement IFormattable, so any format specifier is left unused and .ToString(IFormatProvider) is called instead.

Also:

Ildasm shows that Int32.ToString(String, INumberFormat) internally calls

 string System.Number::FormatInt32(int32,
     string,
     class System.Globalization.NumberFormatInfo)

But it is the internalcall method (extern implemented somewhere in native code), so Ildasm is of no use if we want to determine the source of the problem.

EDIT - CULPRIT:

After reading the How to see code of method which marked as MethodImplOptions.InternalCall? I've used the source code from Shared Source Common Language Infrastructure 2.0 Release (it is .NET 2.0 but nonetheless) in attempt to find a culprit.

Code for the Number.FormatInt32 is located in the ...\sscli20\clr\src\vm\comnumber.cpp file.

The culprit could be deduced from the default section of the format switch statement of the FCIMPL3(Object*, COMNumber::FormatInt32, INT32 value, StringObject* formatUNSAFE, NumberFormatInfo* numfmtUNSAFE):

default:
    NUMBER number;
    Int32ToNumber(value, &number);
    if (fmt != 0) {
      gc.refRetString = NumberToString(&number, fmt, digits, gc.refNumFmt);
      break;
    }
    gc.refRetString = NumberToStringFormat(&number, gc.refFormat, gc.refNumFmt);
    break;

The fmt var is 0, so the NumberToStringFormat(&number, gc.refFormat, gc.refNumFmt); is being called.

It leads us to nothing else than to the second switch statement default section in the NumberToStringFormat method, that is located in the loop that enumerates every format string character. It is very simple:

default:
    *dst++ = ch;

It just plain copies every character from the format string into the output array, that's how the format string ends repeated in the output.

From one point of view it allows to really use garbage format strings that will output nothing useful, but from other point of view it will allow you to use something like:

String garbageFormatted = (1234 as IFormattable).ToString("0 thousands and ### in thousand",
    CultureInfo.CurrentCulture.NumberFormat);

Console.WriteLine(garbageFormatted); 
// Writes the "1 thousands and 234 in thousand"

that can be handy in some situations.

Upvotes: 2

Hans Passant
Hans Passant

Reputation: 942010

Just review the MSDN page about composite formatting for clarity.

A basic synopsis, the format item syntax is:

 { index[,alignment][:formatString]}

So what appears after the : colon is the formatString. Look at the "Format String Component" section of the MSDN page for what kind of format strings are predefined. You will not see System.String mentioned in that list. Which is no great surprise, a string is already "formatted" and will only ever appear in the output as-is.

Composite formatting is pretty lenient to mistakes, it won't throw an exception when you specify an illegal format string. That the one you used isn't legal is already pretty evident from the output you get. And most of all, the scheme is extensible. You can actually make a :message format string legal, a class can implement the ICustomFormatter interface to implement its own custom formatting. Which of course isn't going to happen on System.String, you cannot modify that class.

So this works as expected. If you don't get the output you expected then this is pretty easy to debug, you've just go two mistakes to consider. The debugger eliminates one (wrong argument), your eyes eliminates the other.

Upvotes: 4

Mick
Mick

Reputation: 6864

No it's not acceptable to place anything you like after the colon. Putting anything other than a recognized format specifier is likely to result in either an exception or unpredictable behaviour as you've demonstrated. I don't think you can expect string.Format to behave consistently when you're passing it arguments which are completely inconsistent with the documented formatting types

Upvotes: 0

Related Questions