tmutton
tmutton

Reputation: 1101

c# replace custom tags

I have a text editor similar to what is used on stack overflow. I am processing the text string in c# but also allowing users to format text within that using a custom tag. For example..

<year /> will output the current year.
"Hello <year /> World" would render Hello 2012 World

What I would like to do is to create a regular expression to search the string for any occurance of <year /> and replace it. Further to that, I would also like to add attributes to the tag and be able to extract them so <year offset="2" format="5" />. I'm not great with RegEx but hopefully someone out there knows how to do this?

Thanks

Upvotes: 1

Views: 2653

Answers (2)

Jonathan Dickinson
Jonathan Dickinson

Reputation: 9218

Ideally you shouldn't be using regex for this; but seeing as Html Agility Pack doesn't have a HtmlReader I guess you have to.

That being said, looking at other markup solutions, they often use a list of regex patterns and the relevant replacement - so we shouldn't write a 'general' case (e.g. <([A-Z][A-Z0-9]*)>.*?</\1> would be the wrong thing to do here, instead we would want <year>.*?</year>).

Initially you would probably create a class to hold information about a recognised token, for example:

public class Token
{
    private Dictionary<string, string> _attributes = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
    public string InnerText { get; private set; }

    public string this[string attributeName]
    {
        get
        {
            string val;
            _attributes.TryGetValue(attributeName, out val);
            return val;
        }
    }

    public Token(string innerText, IEnumerable<KeyValuePair<string, string>> values)
    {
        InnerText = innerText;
        foreach (var item in values)
        {
            _attributes.Add(item.Key, item.Value);
        }
    }

    public int GetInteger(string name, int defaultValue)
    {
        string val;
        int result;
        if (_attributes.TryGetValue(name, out val) && int.TryParse(val, out result))
            return result;
        return defaultValue;
    }
}

Now we need to create the regex. For example, a regex to match your year element would look like:

<Year(?>\s*(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*>(?<itext>.*?)</Year>

So we can generalise this to:

<{0}\s*(?>(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*>(?<itext>.*?)</{0}>
<{0}\s*(?>(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*/>

Given those general tag regexes we can write the markup class:

public class MyMarkup
{
    // These are used to build up the regex.
    const string RegexInnerText = @"<{0}\s*(?>(?<aname>\w*?)\s*=\s*""(?<aval>[^""]*)""\s*)*>(?<itext>.*?)</{0}>";
    const string RegexNoInnerText = @"<{0}\s*(?>(?<aname>\w*?)\s*=\s*""(?<aval>[^""]*)""\s*)*/>";

    private static LinkedList<Tuple<Regex, MatchEvaluator>> _replacers = new LinkedList<Tuple<Regex, MatchEvaluator>>();

    static MyMarkup()
    {
        Register("year", false, tok =>
        {
            var count = tok.GetInteger("digits", 4);
            var yr = DateTime.Now.Year.ToString();
            if (yr.Length > count)
                yr = yr.Substring(yr.Length - count);
            return yr;
        });
    }

    private static void Register(string tagName, bool supportsInnerText, Func<Token, string> replacement)
    {
        var eval = CreateEvaluator(replacement);

        // Add the no inner text variant.
        _replacers.AddLast(Tuple.Create(CreateRegex(tagName, RegexNoInnerText), eval));
        // Add the inner text variant.
        if (supportsInnerText)
            _replacers.AddLast(Tuple.Create(CreateRegex(tagName, RegexInnerText), eval));
    }

    private static Regex CreateRegex(string tagName, string format)
    {
        return new Regex(string.Format(format, Regex.Escape(tagName)), RegexOptions.Compiled | RegexOptions.IgnoreCase);
    }

    public static string Execute(string input)
    {
        foreach (var replacer in _replacers)
            input = replacer.Item1.Replace(input, replacer.Item2);
        return input;
    }

    private static MatchEvaluator CreateEvaluator(Func<Token, string> replacement)
    {
        return match =>
        {
            // Grab the groups/values.
            var aname = match.Groups["aname"];
            var aval = match.Groups["aval"];
            var itext = match.Groups["itext"].Value;

            // Turn aname and aval into a KeyValuePair.
            var attrs = Enumerable.Range(0, aname.Captures.Count)
                .Select(i => new KeyValuePair<string, string>(aname.Captures[i].Value, aval.Captures[i].Value));

            return replacement(new Token(itext, attrs));
        };
    }
}

It's all really rough work, but it should give you a good idea of what you should be doing.

Upvotes: 3

Oded
Oded

Reputation: 498992

string.Replace is sufficient for the first requirement - no need for a RegEx.

string.Replace(myString, "<year />", @"<year offset=""2"" /">")

In order to extract the attribute value - you can split on ":

var val = @"<year offset=""2"" /">".Split('"')[1];

Update (following comments):

You can try using the Html Agility Pack to parse and manipulate the text. It operates well on HTML fragments - well and mal-formed, though I am not sure how it would deal with custom tags (worth a shot). It might be overkill though.

Upvotes: 0

Related Questions