Michael Z
Michael Z

Reputation: 4614

regular expression repeating subexpression

I have the following text

<pattern name="pattern1"/>
<success>success case 1</success>
<failed> failure 1</failed>
<failed> failure 2</failed>
<unknown> unknown </unknown>
<pattern name="pattern4"/>
<pattern name="pattern5"/>        
<success>success case 3</success> 
<pattern name="pattern2"/>        
<success>success case 2</success>
<otherTag>There are many other tags.</otherTag>
<failed> failure 3</failed>
<pattern name="pattern3"/> 
<unknown>unkown</unknown> 

And the regular expression <failed>[\w|\W]*?</failed> matches all the lines contains failed tag.

What do I need to to if I want all failed tags and the pattern tag above the failed tag. if there is no failed tag underneath a pattern tag, then the pattern tag should not be matched? Basically, I want the following output:

<pattern name="pattern1"/>
<failed> failure 1</failed>
<failed> failure 2</failed>
<pattern name="pattern2"/>
<failed> failure 3</failed>

I am doing this in javascript, I do not mind of doing some intermediate steps.

edit start Almost all repliers suggest me to take a different approach. I am unsure which approach I should take. JQuery, regex or others. I am giving more information here for better decision making. The data format would change, but would not change often. The data is from a schematron validition report of file type ".SVRL" The structure of the file are have the following schema defined using "RELAX NG compact syntax"

schematron-output   = element schematron-output {
attribute title { text }?,
attribute phase { xsd:NMTOKEN }?,
attribute schemaVersion { text }?,
    human-text*,
    ns-prefix-in-attribute-values*,
    (active-pattern,
    (fired-rule, (failed-assert | successful-report)*)+)+
}

the maps to active-pattern, and matches to failed-assert and successful-report respectively.

Now with additional information, which approach should I be taking? Thanks very much for helping out. :)

edit end

Upvotes: 1

Views: 1397

Answers (3)

smnh
smnh

Reputation: 1745

Here are the RegExp you need:

<(pattern|failed)\b[^>]*(?:/>|>[^<]*</\1>)

Just escape the slashes when using in Javascript regular expression notation:

var regExp = /<(pattern|failed)\b[^>]*(?:\/>|>[^<]*<\/\1>)/gi;
var matchesArray = testString.match(regExp);

This regular expression will find whole <pattern> and <failed> tags, either if they are empty tags or not (<empty/> or <notEmpty></notEmpty>). It also considers possible element attributes.

Upvotes: 1

broofa
broofa

Reputation: 38112

You can use the regex "|" operator (meaning "or") to create a regex that will match one or more expressions. For example ...

/^<failed>[\w|\W]*?<\/failed>|^<pattern[^>]*>/

... should do what you're asking (based on the example you've given above).

But, as other commenters have said, parsing XML with regexs is a slippery slope. You'll probably want to look into other options, like using the DocumentFragment class to parse your string for you.

Upvotes: 1

thomasrutter
thomasrutter

Reputation: 117333

You should look into methods other than regular expressions to parse XML, particularly if:

  • your requirements are likely to change in future, making your regular expression increasingly unweildy
  • you are parsing data from a third-party source, which may contain just about anything, including strings that look like XML tags embedded in XML comments, CDATA sections or attributes.

See this answer for information about XML parsing in Javascript.

The easy solution is "use jQuery". If for some reason you don't want to load jQuery to do this, then start here.

Upvotes: 1

Related Questions