Reputation: 8921
I am trying to create a generic formatter/parser combination.
Example scenario:
var format = "{0}-{1}"
var arr = new[] { "asdf", "qwer" }
var res = string.Format(format, arr)
What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):
var arr2 = string.Unformat(format, res)
// when: res = "asdf-qwer"
// arr2 should be equal to arr
Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?
Upvotes: 22
Views: 23723
Reputation: 71
This is my implementation. Parsing is then done somewhere outside.
/// <summary>
/// If a string was formatted e.g. with the format string "{0} … {1}", this method extracts the <i>original strings</i> which were
/// placed in the place holders {0} and {1}.
/// This method only works if the strings between the <i>original strings</i> are not contained in the <i>original strings</i>. E.g. if you put "0 … 1"
/// and "2" in the aforementioned format string, you obtain "0 … 1 … 2" which cannot be converted back properly.
/// </summary>
/// <param name="completeString">The complete string which was formatted using the format string.</param>
/// <param name="formatString">The format string.</param>
/// <returns>The strings which were formatted using the format string. For "Hello … World" in our example, "Hello" and "World" would be returned.</returns>
public static string[] ExtractFromFormattedString(string completeString, string formatString) {
/* Replace everything in curly brackets (the place where the actual strings are placed) by the NeverUsedCharacter.
* .*? = everything in the brackets, but as few as possible (?) */
formatString = Regex.Replace(formatString, "{.*?}", NeverUsedCharacter);
string[] splitFormatString = formatString.Split(NeverUsedCharacter.ToCharArray()); // ToCharArray: Convert the string with a single character to a character.
List<string> returnValue = new();
for (int i = 0; i < splitFormatString.Length - 1; i++) {
completeString = removeFirst(completeString, splitFormatString[i]); // Remove everything from the start to the first opening curly bracket in the original format string.
int endIndex = completeString.IndexOf(splitFormatString[i + 1]); // The end index is the beginning of the next part of the format string.
string match;
if (endIndex == 0) { // can happen if there is nothing after the last {…} in the format string. Then splitFormatString[i + 1] is empty.
match = completeString;
}
else {
match = completeString.Substring(0, endIndex); // Everything in between is what we are looking for.
}
returnValue.Add(match);
completeString = removeFirst(completeString, match); // Remove the match so that we can start again at index 0 in the next iteration.
}
return returnValue.ToArray();
// local method to remove only the first occurrence of the stringToRemove in the originalString (string.Replace replaces all occurrences).
string removeFirst(string originalString, string stringToRemove) {
int startIndex = originalString.IndexOf(stringToRemove);
int endIndex = startIndex + stringToRemove.Length;
return originalString.Substring(0, startIndex) + originalString.Substring(endIndex, originalString.Length - endIndex);
}
}
Upvotes: 0
Reputation: 10536
You can't unformat because information is lost. String.Format
is a "destructive" algorithm, which means you can't (always) go back.
Create a new class inheriting from string
, where you add a member that keeps track of the "{0}-{1}"
and the { "asdf", "qwer" }
, override ToString()
, and modify a little your code.
If it becomes too tricky, just create the same class, but not inheriting from string
and modify a little more your code.
IMO, that's the best way to do this.
Upvotes: 13
Reputation: 707
While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.
One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact()
, akin to DateTime.ParseExact()
. Data is returned as an array of strings, but if you can live with that, it is terribly handy.
public static class StringExtensions
{
public static string[] ParseExact(
this string data,
string format)
{
return ParseExact(data, format, false);
}
public static string[] ParseExact(
this string data,
string format,
bool ignoreCase)
{
string[] values;
if (TryParseExact(data, format, out values, ignoreCase))
return values;
else
throw new ArgumentException("Format not compatible with value.");
}
public static bool TryExtract(
this string data,
string format,
out string[] values)
{
return TryParseExact(data, format, out values, false);
}
public static bool TryParseExact(
this string data,
string format,
out string[] values,
bool ignoreCase)
{
int tokenCount = 0;
format = Regex.Escape(format).Replace("\\{", "{");
for (tokenCount = 0; ; tokenCount++)
{
string token = string.Format("{{{0}}}", tokenCount);
if (!format.Contains(token)) break;
format = format.Replace(token,
string.Format("(?'group{0}'.*)", tokenCount));
}
RegexOptions options =
ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
Match match = new Regex(format, options).Match(data);
if (tokenCount != (match.Groups.Count - 1))
{
values = new string[] { };
return false;
}
else
{
values = new string[tokenCount];
for (int index = 0; index < tokenCount; index++)
values[index] =
match.Groups[string.Format("group{0}", index)].Value;
return true;
}
}
}
Upvotes: 18
Reputation: 5374
After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:
Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);
and in Unformat method, you can simply pass a string and look up that string and return the array used:
string [] Unformat(string res)
{
string [] arr;
unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
return arr;
}
Upvotes: 0
Reputation: 41106
A simple solution might be to
format
This would resolve the ambiguities to the shortest possible match.
(I'm not good at RegEx, so please correct me, folks :))
Upvotes: 1
Reputation: 5086
Assuming "-" is not in the original strings, can you not just use Split?
var arr2 = formattedString.Split('-');
Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.
Upvotes: 2
Reputation: 422006
It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format
method. Assume:
String.Format("{0}-{1}", "hello-world", "stack-overflow");
How would you "Unformat" it?
Upvotes: 4