Reputation: 24572
I have strings that look like this:
"<span>X</span>間違<span>う</span><span>ABCDE</span>"
How can I add spans to the elements that do not have spans already so the string looks like this:
"<span>X</span><span>間</span><span>違</span><span>う</span><span>ABCDE</span>"
Is this something that I can do with Regex?
Example 2 source
"<span>X</span>A<span>う</span>ABC<span>Y</span>"
Example 2 result
"<span>X</span><span>A</span><span>う</span><span>A</span><span>B</span><span>C</span><span>Y</span>"
Example 3 source:
"間違<span>う</span>"
Example 3 result:
"<span>間</span><span>違</span><span>う</span>
Example 4 source:
"<span>う</span>間違"
Example 4 result:
"<span>う</span><span>間</span><span>違</span>"
Please note, it's only the characters that do not have a span that I need to add spans to each of. I hope it makes sense. So in the first case "ABCDE" needs to stay as "ABCDE".
Upvotes: 1
Views: 295
Reputation: 627086
Since the string you process is not actually HTML and just plain text with non-nested span
tags, the problem can be solved with regex while treating <span>
and </span>
as starting and ending delimiters.
You may capture and keep the text between two tags and match any other char in other contexts:
var pattern = @"(?s)(<span(?:\s+[^>]*)?>.*?</span>)|\P{M}\p{M}*";
var result = Regex.Replace(text, pattern, x =>
x.Groups[1].Success ? x.Groups[1].Value : $"<span>{x.Value}</span>");
The pattern will become more efficient if you replace .*?</span>
with [^<]*(?:<(?!</span>)[^<]*)*
:
var pattern = @"(<span(?:\s+[^>]*)?>[^<]*(?:<(?!/span>)[^<]*)*</span>)|\P{M}\p{M}*";
Details
(<span(?:\s+[^>]*)?>[^<]*(?:<(?!/span>)[^<]*)*</span>)
- Group 1: matches and captures a
<span
- a literal substring, then (?:\s+[^>]*)?>
- an optional 1+ whitespaces followed with 0+ chars other than >
[^<]*
- 0+ chars other than <
followed with (?:<(?!/span>)[^<]*)*
- 0 or more occurrences of <
not followed with /span>
and then any 0+ chars other than <
and then </span>
- </span>
text|
- or\P{M}\p{M}*
- any Unicode grapheme.The x.Groups[1].Success ? x.Groups[1].Value : $"<span>{x.Value}</span>")
logic reverts Group 1 value if Group 1 participated in the match, else, wraps the matched char with span tags.
Upvotes: 1
Reputation: 35075
(Updated in the light of the new examples)
Regex will fail for html. Please see RegEx match open tags except XHTML self-contained tags
Something like this could do the job.
Regex.Replace(input, "(^|</span>)(.*?)(<span>|$)", "$1<span>$2</span>$3");
Please note that this will not split words are not wrapped in spans; it will just wrap them in spans. Since words that are already wrapped in spans are not split this seems reasonable.
string input = "間違<span>う</span>X<span>ABC</span>Y<span>DEF</span>GHI";
Console.WriteLine(input);
var replaced = Regex.Replace(input, "(^|</span>)(.*?)(<span>|$)", "$1<span>$2</span>$3");
Console.WriteLine(replaced);
間違<span>う</span>X<span>ABC</span>Y<span>DEF</span>GHI
<span>間違</span><span>う</span><span>X</span><span>ABC</span><span>Y</span><span>DEF</span><span>GHI</span>
Upvotes: 1
Reputation: 3498
You can strip the tags to get the plain text, then add the tags to each character.
Example :
var span = "<span>X</span>間違<span>う</span><span>Y</span>";
var plain = span.Replace("<span>", "").Replace("</span>", "").Trim();
var sb = new StringBuilder(string.Empty);
for(int x =0; x < plain.Length; x++)
{
sb.Append($"<span>{plain[x]}</span>");
}
Upvotes: 0