user3256508
user3256508

Reputation: 31

Is there a regular expression for this?

I have a problem finding a regular expression. I have some text, maybe divided by some xml. For example:

<root>
  <text>Thi</text>
  <text>s is ju</text>
  <text><bold>s</bold></text>
  <text>t a tes</text>
  <text><italic>t</italic></text>
</root>

I want to search for the word "just" in the xml and need the result

ju</text>
<text><bold>s</bold></text>
<text>t

Is there any posibility to get this result with a regular expression?

By the way: I already have the regular expression to get the plain text from the xml, it is (in C#-Syntax):

string plaintext = new Regex(@"\<[^\<]*\>").Replace(xmlstring, string.Empty);

This one finds every "<" to ">" with everything (*) in between but not another "<" and replaces it with string.Empty. So i get the plain text and could search for my "just", but the result would just be "just", not with the xml in between...

Does anybody have an idea?

Upvotes: 2

Views: 126

Answers (3)

Ulugbek Umirov
Ulugbek Umirov

Reputation: 12807

If you have XML in single line (with no whitespaces), you can create your regex by splitting letters in just by (?:<[^>]*>)* regex parts. Example:

j(?:<[^>]*>)*u(?:<[^>]*>)*s(?:<[^>]*>)*t

If you still need to process multiline xml, you can split letters by (?! )(?:<[^>]*>\s*)*(?<! ) regex. It would allow whitespaces between XML tags, but wouldn't allow space before or after letter.

j(?! )(?:<[^>]*>\s*)*(?<! )u(?! )(?:<[^>]*>\s*)*(?<! )s(?! )(?:<[^>]*>\s*)*(?<! )t

Upvotes: 1

Trygve Flathen
Trygve Flathen

Reputation: 696

Try this:

/j(<[^>]+>)*u(<[^>]+>)*s(<[^>]+>)*t/

Upvotes: 1

Lanorkin
Lanorkin

Reputation: 7534

Better don't use regexp over xml. Just don't.

According to your task, after each character of string you are looking for, you can expect any xml tags. So basically you need to insert 'maybetag' regex part after each letter - something like this:

j(\<[^\<]*?\>\s*)*u(\<[^\<]*?\>\s*)*s(\<[^\<]*?\>\s*)*t(\<[^\<]*?\>\s*)*

Working sample http://www.rexfiddle.net/WdkpliZ

Upvotes: 1

Related Questions