Reputation: 1101
I have a text editor similar to what is used on stack overflow. I am processing the text string in c# but also allowing users to format text within that using a custom tag. For example..
<year /> will output the current year.
"Hello <year /> World" would render Hello 2012 World
What I would like to do is to create a regular expression to search the string for any occurance of <year />
and replace it. Further to that, I would also like to add attributes to the tag and be able to extract them so <year offset="2" format="5" />
. I'm not great with RegEx but hopefully someone out there knows how to do this?
Thanks
Upvotes: 1
Views: 2653
Reputation: 9218
Ideally you shouldn't be using regex for this; but seeing as Html Agility Pack doesn't have a HtmlReader
I guess you have to.
That being said, looking at other markup solutions, they often use a list of regex patterns and the relevant replacement - so we shouldn't write a 'general' case (e.g. <([A-Z][A-Z0-9]*)>.*?</\1>
would be the wrong thing to do here, instead we would want <year>.*?</year>
).
Initially you would probably create a class to hold information about a recognised token, for example:
public class Token
{
private Dictionary<string, string> _attributes = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
public string InnerText { get; private set; }
public string this[string attributeName]
{
get
{
string val;
_attributes.TryGetValue(attributeName, out val);
return val;
}
}
public Token(string innerText, IEnumerable<KeyValuePair<string, string>> values)
{
InnerText = innerText;
foreach (var item in values)
{
_attributes.Add(item.Key, item.Value);
}
}
public int GetInteger(string name, int defaultValue)
{
string val;
int result;
if (_attributes.TryGetValue(name, out val) && int.TryParse(val, out result))
return result;
return defaultValue;
}
}
Now we need to create the regex. For example, a regex to match your year
element would look like:
<Year(?>\s*(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*>(?<itext>.*?)</Year>
So we can generalise this to:
<{0}\s*(?>(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*>(?<itext>.*?)</{0}>
<{0}\s*(?>(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*/>
Given those general tag regexes we can write the markup class:
public class MyMarkup
{
// These are used to build up the regex.
const string RegexInnerText = @"<{0}\s*(?>(?<aname>\w*?)\s*=\s*""(?<aval>[^""]*)""\s*)*>(?<itext>.*?)</{0}>";
const string RegexNoInnerText = @"<{0}\s*(?>(?<aname>\w*?)\s*=\s*""(?<aval>[^""]*)""\s*)*/>";
private static LinkedList<Tuple<Regex, MatchEvaluator>> _replacers = new LinkedList<Tuple<Regex, MatchEvaluator>>();
static MyMarkup()
{
Register("year", false, tok =>
{
var count = tok.GetInteger("digits", 4);
var yr = DateTime.Now.Year.ToString();
if (yr.Length > count)
yr = yr.Substring(yr.Length - count);
return yr;
});
}
private static void Register(string tagName, bool supportsInnerText, Func<Token, string> replacement)
{
var eval = CreateEvaluator(replacement);
// Add the no inner text variant.
_replacers.AddLast(Tuple.Create(CreateRegex(tagName, RegexNoInnerText), eval));
// Add the inner text variant.
if (supportsInnerText)
_replacers.AddLast(Tuple.Create(CreateRegex(tagName, RegexInnerText), eval));
}
private static Regex CreateRegex(string tagName, string format)
{
return new Regex(string.Format(format, Regex.Escape(tagName)), RegexOptions.Compiled | RegexOptions.IgnoreCase);
}
public static string Execute(string input)
{
foreach (var replacer in _replacers)
input = replacer.Item1.Replace(input, replacer.Item2);
return input;
}
private static MatchEvaluator CreateEvaluator(Func<Token, string> replacement)
{
return match =>
{
// Grab the groups/values.
var aname = match.Groups["aname"];
var aval = match.Groups["aval"];
var itext = match.Groups["itext"].Value;
// Turn aname and aval into a KeyValuePair.
var attrs = Enumerable.Range(0, aname.Captures.Count)
.Select(i => new KeyValuePair<string, string>(aname.Captures[i].Value, aval.Captures[i].Value));
return replacement(new Token(itext, attrs));
};
}
}
It's all really rough work, but it should give you a good idea of what you should be doing.
Upvotes: 3
Reputation: 498992
string.Replace
is sufficient for the first requirement - no need for a RegEx.
string.Replace(myString, "<year />", @"<year offset=""2"" /">")
In order to extract the attribute value - you can split
on "
:
var val = @"<year offset=""2"" /">".Split('"')[1];
Update (following comments):
You can try using the Html Agility Pack to parse and manipulate the text. It operates well on HTML fragments - well and mal-formed, though I am not sure how it would deal with custom tags (worth a shot). It might be overkill though.
Upvotes: 0