Retrocoder
Retrocoder

Reputation: 4703

Regular expression for extracting XML tag

I have some XML which I want to extract via a javascript regular expression. An example of the XML is shown below:

<rules><and><gt propName="Unit" value="5" type="System.Int32"/><or><startsWith propName="DeviceType"/></or></and></rules>

I’m having problems extracting just the xml names “gt” and “startsWith”. For example, with the following expression

<(.+?)\s

I get:

“<rules><and><gt”

rather than just “gt”.

Can anyone supply the correct expression?

Upvotes: 0

Views: 2259

Answers (4)

Tim Down
Tim Down

Reputation: 324507

The most robust method would be to use the browser's built-in XML parser and standard DOM methods for extracting the elements you want:

var parseXml;

if (window.DOMParser) {
    parseXml = function(xmlStr) {
        return ( new window.DOMParser() ).parseFromString(xmlStr, "text/xml");
    };
} else if (typeof window.ActiveXObject != "undefined" &&
        new window.ActiveXObject("Microsoft.XMLDOM")) {
    parseXml = function(xmlStr) {
        var xmlDoc = new window.ActiveXObject("Microsoft.XMLDOM");
        xmlDoc.async = "false";
        xmlDoc.loadXML(xmlStr);
        return xmlDoc;
    };
} else {
    parseXml = function() { return null; }
}

var xmlStr = '<rules><and>' +
    '<gt propName="Unit" value="5" type="System.Int32"/><or>' + 
    '<startsWith propName="DeviceType"/></or></and></rules>';

var xmlDoc = parseXml(xmlStr);
if (xmlDoc) {
    var gt = xmlDoc.getElementsByTagName("gt")[0];
    alert( gt.getAttribute("propName") );
}

Upvotes: 1

Boldewyn
Boldewyn

Reputation: 82734

Well, \s matches whitespace. So you actually tell the regex engine to:

<(.+?)\s
^^    ^
||    \ until you find a whitespace
|\ slurp in anything (but whitespace)
\ as long as it starts with an opening pointy bracket

You could, for example use:

<([^\s>]+?)

but you should always consider this.

Upvotes: 2

teukkam
teukkam

Reputation: 4317

Don't use a regex to do this kind of things. Rather use the DOM processing functions such as

var gtElements = document.getElementsByTagName('gt');
var startsWithElements = document.getElementsByTagName('startsWith'); 

Upvotes: 1

Kobi
Kobi

Reputation: 138007

Regex is a poor tool to parse xml. You can easily parse the XML in JavaScript. A library like jQuery makes this task especially easy (for example):

var xml = '<rules><and><gt propName="Unit" value="5" type="System.Int32"/><or><startsWith propName="DeviceType"/></or></and></rules>';
var gt = $('gt', xml);
var t = gt.attr('type'); //System.Int32

Upvotes: 4

Related Questions