Mr. Smith
Mr. Smith

Reputation: 4506

Multiple String.Replace without interference?

What is a prudent approach to performing multiple String.Replace without replacing text that has already been replaced. For example, say I have this string:

str = "Stacks be [img]http://example.com/overflowing.png[/img] :/";

A Regex I wrote will match the [img]url[/img], and let me replace it with the proper HTML <img> formatting.

str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";

Afterwards I perform String.Replace to replace emoticon codes (:/, :(,:P, etc) with <img> tags. However, there's unintended results:

Intended Result

str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> " + 
    "<img src=\"emote-sigh.png\"/>";

Actual (and obvious) Result

str = "Stacks be <img src=\"http<img src=\"emote-sigh.png"/> " + 
    "/example.com/overflowing.png\"/>" + 
    "<img src=\"emote-sigh.png\"/>";

Unfortunently, with the number of replacements I plan to make, it seems impracticle to try to do it all in a single Regex expression (though I'd imagine that would be the most performant solution). What is a (slower but) more maintainable way to do this?

Upvotes: 0

Views: 225

Answers (8)

Tobiasz
Tobiasz

Reputation: 1069

If you do not want to use any complex Regex than you can e.g. split the text into any kind of container.

You should split based on tokens found in the text: in your case a token is a text between [img] [/img] (including those [img] tags), that is [img]http://example.com/overflowing.png[/img].

Then you can apply [img] replace method on these tokens and emoticons replace method on the rest of elements in the aforementioned container. Then you just output a string containing all the container elements.

Below you fill find example contents of such container after the split procedure:

 1. "Stacks be " 
 2. "[img]http://example.com/overflowing.png[/img]" 
 3. " :/" 

To elements 1 & 3 you apply emoticon replace and in case of token element number 2 you apply [img] replace.

Upvotes: 1

Tapan kumar
Tapan kumar

Reputation: 6999

Here is the code which did the replace in my case. And the output is exactly what you want.

    str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";


        // check if the htmltemplate hold any template then set it or else hide the div data.
        if (!String.IsNullOrEmpty(str))
        {
            divStaticAsset.InnerHtml = str.Replace("[img]", "<img src=\'").
                                                    Replace("[/img]", "\'/>") + "<img src=\'emote-sigh.png'/>";

        }

Upvotes: 0

mlorbetske
mlorbetske

Reputation: 5649

Another alternative is to use a sort of a modified Lexer to isolate each of the discrete regions in your text where a certain replacement is warranted and marking that block so that replacements aren't run in it again

Here's an example of how you'd do that:

First, we'll create a class that indicates whether a particular string is used or not

public class UsageIndicator
{
    public string Value { get; private set; }

    public bool IsUsed { get; private set; }

    public UsageIndicator(string value, bool isUsed)
    {
        Value = value;
        IsUsed = isUsed;
    }

    public override string ToString()
    {
        return Value;
    }
}

Then we'll define a class that represents both how to locate a "token" in your text and what to do when it's been found

public class TokenOperation
{
    public Regex Pattern { get; private set; }

    public Func<string, string> Mutator { get; private set; }

    public TokenOperation(string pattern, Func<string, string> mutator)
    {
        Pattern = new Regex(pattern);
        Mutator = mutator;
    }

    private List<UsageIndicator> ExtractRegions(string source, int index, int length, out int matchedIndex)
    {
        var result = new List<UsageIndicator>();
        var head = source.Substring(0, index);
        matchedIndex = 0;

        if (head.Length > 0)
        {
            result.Add(new UsageIndicator(head, false));
            matchedIndex = 1;
        }

        var body = source.Substring(index, length);
        body = Mutator(body);
        result.Add(new UsageIndicator(body, true));

        var tail = source.Substring(index + length);

        if (tail.Length > 0)
        {
            result.Add(new UsageIndicator(tail, false));
        }

        return result;
    }

    public void Match(List<UsageIndicator> source)
    {
        for (var i = 0; i < source.Count; ++i)
        {
            if (source[i].IsUsed)
            {
                continue;
            }

            var value = source[i];
            var match = Pattern.Match(value.Value);

            if (match.Success)
            {
                int modifyIBy;
                source.RemoveAt(i);
                var regions = ExtractRegions(value.Value, match.Index, match.Length, out modifyIBy);

                for (var j = 0; j < regions.Count; ++j)
                {
                    source.Insert(i + j, regions[j]);
                }

                i += modifyIBy;
            }
        }
    }
}

After taking care of those things, putting something together to do the replacement is pretty simple

public class Rewriter
{
    private readonly List<TokenOperation> _definitions = new List<TokenOperation>();

    public void AddPattern(string pattern, Func<string, string> mutator)
    {
        _definitions.Add(new TokenOperation(pattern, mutator));
    }

    public void AddLiteral(string pattern, string replacement)
    {
        AddPattern(Regex.Escape(pattern), x => replacement);
    }

    public string Rewrite(string value)
    {
        var workingValue = new List<UsageIndicator> { new UsageIndicator(value, false) };

        foreach (var definition in _definitions)
        {
            definition.Match(workingValue);
        }

        return string.Join("", workingValue);
    }
}

In the demo code (below), keep in mind that the order in which pattern or literal expressions are added is important. The things that are added first get tokenized first, so, to prevent the :// in the url from getting picked off as an emoticon plus a slash, we process the image block first, as it'll contain the url between the tags and be marked as used before the emoticon rule can try to get it.

class Program
{
    static void Main(string[] args)
    {
        var rewriter = new Rewriter();
        rewriter.AddPattern(@"\[img\].*?\[/img\]", x => x.Replace("[img]", "<img src=\"").Replace("[/img]", "\"/>"));
        rewriter.AddLiteral(":/", "<img src=\"emote-sigh.png\"/>");
        rewriter.AddLiteral(":(", "<img src=\"emote-frown.png\"/>");
        rewriter.AddLiteral(":P", "<img src=\"emote-tongue.png\"/>");

        const string str = "Stacks be [img]http://example.com/overflowing.png[/img] :/";
        Console.WriteLine(rewriter.Rewrite(str));
    }
}

The sample prints:

Stacks be <img src="http://example.com/overflowing.png"/> <img src="emote-sigh.png"/>

Upvotes: 2

Abdul Saleem
Abdul Saleem

Reputation: 10612

        string[] emots = { ":/", ":(", ":)" };
        string[] emotFiles = { "emote-sigh", "emot-sad.png", "emot-happy.png" };

        string replaceEmots(string val)
        {
            string res = val;
            for (int i = 0; i < emots.Length; i++)
                res = res.Replace(emots[i], "<img src=\"" + emotFiles[i] + ".png\"/>");
            return res;
        }

        void button1_click()
        {
            string str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";
            str = replaceEmots(str);
        }

Upvotes: 0

iTURTEV
iTURTEV

Reputation: 331

Here is a code snippet from my old project:

private string Emoticonize(string originalStr)
{
    StringBuilder RegExString = new StringBuilder(@"(?<=^|\s)(?:");
    foreach (KeyValuePair<string, string> e in Emoticons)
    {
        RegExString.Append(Regex.Escape(e.Key) + "|");
    }
    RegExString.Replace("|", ")", RegExString.Length - 1, 1);
    RegExString.Append(@"(?=$|\s)");
    MatchCollection EmoticonsMatches = Regex.Matches(originalStr, RegExString.ToString());

    RegExString.Clear();
    RegExString.Append(originalStr);
    for (int i = EmoticonsMatches.Count - 1; i >= 0; i--)
    {
        RegExString.Replace(EmoticonsMatches[i].Value, Emoticons[EmoticonsMatches[i].Value], EmoticonsMatches[i].Index, EmoticonsMatches[i].Length);
    }

    return RegExString.ToString();
}

Emoticons is a Dictionary where I have stored emoticon codes as a key and the corresponding images for a value.

Upvotes: 0

npinti
npinti

Reputation: 52185

The most obvious approach would be to use a regular expression to replace whatever text you need. So in short, you could use a regex like so: :/[^/] to match :/ but not ://.

You could also use groups to know which pattern you have matched thus allowing you to know what to put.

Upvotes: 3

Amadan
Amadan

Reputation: 198324

Unfortunently, with the number of replacements I plan to make, it seems impracticle to try to do it all in a single Regex expression (though I'd imagine that would be the most performant solution). What is a (slower but) more maintainable way to do this?

Might seem so, but isn't. Take a look at this article.

tl;dr: Replace accepts a delegate as its second argument. So match on a pattern that is a disjunction of all the different things you want to simultaneously replace, and in the delegate use a Dictionary or a switch or a similar strategy to select the correct replacement for the current element.

The strategy in the article depends on keys being static strings; if there are regexp operators in keys, the concept fails. There is a better way, by wrapping the keys in capture parentheses, you can just test for the presence of the appropriate capture group to see which brace matched.

Upvotes: 3

Nitu Bansal
Nitu Bansal

Reputation: 3856

you can replace like below

string.replace( string.replace("[img]","<img src=\""),"[/img]","\"/>")

it should work.

Upvotes: 0

Related Questions