Alan2
Alan2

Reputation: 24572

How can I add spans to characters inside a string that currently don't have <span> elements?

I have strings that look like this:

 "<span>X</span>間違<span>う</span><span>ABCDE</span>"

How can I add spans to the elements that do not have spans already so the string looks like this:

 "<span>X</span><span>間</span><span>違</span><span>う</span><span>ABCDE</span>"

Is this something that I can do with Regex?

Example 2 source

"<span>X</span>A<span>う</span>ABC<span>Y</span>"

Example 2 result

"<span>X</span><span>A</span><span>う</span><span>A</span><span>B</span><span>C</span><span>Y</span>" 

Example 3 source:

"間違<span>う</span>"

Example 3 result:

"<span>間</span><span>違</span><span>う</span>

Example 4 source:

"<span>う</span>間違"

Example 4 result:

"<span>う</span><span>間</span><span>違</span>"

Please note, it's only the characters that do not have a span that I need to add spans to each of. I hope it makes sense. So in the first case "ABCDE" needs to stay as "ABCDE".

Upvotes: 1

Views: 295

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627086

Since the string you process is not actually HTML and just plain text with non-nested span tags, the problem can be solved with regex while treating <span> and </span> as starting and ending delimiters.

You may capture and keep the text between two tags and match any other char in other contexts:

var pattern = @"(?s)(<span(?:\s+[^>]*)?>.*?</span>)|\P{M}\p{M}*";
var result = Regex.Replace(text, pattern, x => 
    x.Groups[1].Success ? x.Groups[1].Value : $"<span>{x.Value}</span>");

The pattern will become more efficient if you replace .*?</span> with [^<]*(?:<(?!</span>)[^<]*)*:

var pattern = @"(<span(?:\s+[^>]*)?>[^<]*(?:<(?!/span>)[^<]*)*</span>)|\P{M}\p{M}*";

Details

  • (<span(?:\s+[^>]*)?>[^<]*(?:<(?!/span>)[^<]*)*</span>) - Group 1: matches and captures a
    • <span - a literal substring, then
    • (?:\s+[^>]*)?> - an optional 1+ whitespaces followed with 0+ chars other than >
    • [^<]* - 0+ chars other than < followed with
    • (?:<(?!/span>)[^<]*)* - 0 or more occurrences of < not followed with /span> and then any 0+ chars other than < and then
    • </span> - </span> text
  • | - or
  • \P{M}\p{M}* - any Unicode grapheme.

The x.Groups[1].Success ? x.Groups[1].Value : $"<span>{x.Value}</span>") logic reverts Group 1 value if Group 1 participated in the match, else, wraps the matched char with span tags.

Upvotes: 1

tmaj
tmaj

Reputation: 35075

(Updated in the light of the new examples)

Regex will fail for html. Please see RegEx match open tags except XHTML self-contained tags

I've been warned, I want to use regex for html

Something like this could do the job.

Regex.Replace(input, "(^|</span>)(.*?)(<span>|$)", "$1<span>$2</span>$3");

Please note that this will not split words are not wrapped in spans; it will just wrap them in spans. Since words that are already wrapped in spans are not split this seems reasonable.


Test

string input = "間違<span>う</span>X<span>ABC</span>Y<span>DEF</span>GHI";

Console.WriteLine(input);
var replaced = Regex.Replace(input, "(^|</span>)(.*?)(<span>|$)", "$1<span>$2</span>$3");

Console.WriteLine(replaced);
間違<span>う</span>X<span>ABC</span>Y<span>DEF</span>GHI
<span>間違</span><span>う</span><span>X</span><span>ABC</span><span>Y</span><span>DEF</span><span>GHI</span>

Upvotes: 1

iSR5
iSR5

Reputation: 3498

You can strip the tags to get the plain text, then add the tags to each character.

Example :

    var span = "<span>X</span>間違<span>う</span><span>Y</span>";

    var plain = span.Replace("<span>", "").Replace("</span>", "").Trim();

    var sb = new StringBuilder(string.Empty); 

    for(int x =0; x < plain.Length; x++)
    {
        sb.Append($"<span>{plain[x]}</span>");

    }

Upvotes: 0

Related Questions