What am I doing wrong in parsing this regular expression in javascript?

My string is:

<div> (blah blah blah) ---> quite big HTML before coming to this line.<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>

I managed to formulate a regular expression

var trainDetails = new RegExp("<b>Train No. &amp; Name : </b></td><td.*>([0-9][a-z][A-Z]+)</span></td>", "m");

But trainDetails are null or are empty.

All I am trying to do is to get the train name and the train number within the span element.

Any pointers where I am doing wrong ?

Upvotes: 7

Views: 97

Answers (3)

sina
sina

Reputation: 980

Regular expression is not the ideal solution for this use-case. I suggest using your browser's builtin HTML parser to get the inner HTML of the <span>.

var el = document.createElement('html');
el.innerHTML = '<div> (blah blah blah) ---> quite big HTML before coming to this line.<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>';
var output = el.getElementsByTagName('span')[0].innerHTML;

The value of the output variable becomes:

12672 / SOUTH TRUNK EXP

Edit

If you are interested in a specific <span>, I suggest adding a class to its tag or its parent <td> tag, e.g.:

<span class="train-number-and-name">
   12672 / SOUTH TRUNK EXP
</span>

And fetch it like this:

var output = el.querySelector('span.train-number-and-name').innerHTML;

Upvotes: 4

Vegeta
Vegeta

Reputation: 1317

It worked for me:

Using RegExp

string = '<div> (blah blah blah) ---> quite big HTML before coming to this line.<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>';

var trainDetail = string.replace( new RegExp(".*?([^\>]+)(?:\<\/[A-z]+\>)+$","g"), '$1');

Using DOM

string = ('<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>');
string = string.replace(new RegExp( '(<\/?)td', 'g'), '$1xmltd');
tempDoc = document.createElement('xml');
tempDoc.innerHTML = string;
node = tempDoc.getElementsByTagName('xmltd');
trainDetails = node[node.length-1].textContent;

Assume condition that last "<td>" in string has train detail.

Upvotes: 4

Xue Fang
Xue Fang

Reputation: 108

It should be ok with it : .+\<span>(.+)\<\/span>.+ Catch the group #1 and you will get it.

Upvotes: 1

Related Questions