Jon McIntosh
Jon McIntosh

Reputation:

Get element by id with regex

I had a quick question regarding RegEx...

I have a string that looks something like the following:

"This was written by <p id="auth">John Doe</p> today!"

What I want to do (with javascript) is basically extract out the 'John Doe' from any tag with the ID of "auth".

Could anyone shed some light? I'm sorry to ask.

Full story: I am using an XML parser to pass data into variables from a feed. However, there is one tag in the XML document () that contains HTML passed into a string. It looks something like this:

 <item>
  <title>This is a title</title>
  <description>
  "By <p id="auth">John Doe</p> text text text... so on"
  </description>
 </item>

So as you can see, I can't use an HTML/XML parser for that p tag, because it's in a string, not a document.

Upvotes: 1

Views: 7264

Answers (6)

Thomas Eding
Thomas Eding

Reputation: 1

If the content of the tag contains only text, you could use this:

function getText (htmlStr, id) {
  return new RegExp ("<[^>]+\\sid\\s*=\\s*([\"'])"
    + id 
    + "\\1[^>]*>([^<]*)<"
  ).exec (htmlStr) [2];
}


var htmlStr = "This was written by <p id=\"auth\">John Doe</p> today!";
var id = "auth";
var text = getText (htmlStr, id);
alert (text === "John Doe");

Upvotes: 0

Ryan Kinal
Ryan Kinal

Reputation: 17732

Assuming you only have 1 auth per string, you might go with something like this:

var str = "This was written by <p id=\"auth\">John Doe</p> today!",
    p = str.split('<p id="auth">'),
    q = p[1].split('</p>'),
    a = q[0];
alert(a);

Simple enough. Split your string on your paragraph, then split the second part on the paragraph close, and the first part of the result will be your value. Every time.

Upvotes: 0

Douglas
Douglas

Reputation: 37761

Here's a way to get the browser to do the HTML parsing for you:

var string = "This was written by <p id=\"auth\">John Doe</p> today!";

var div = document.createElement("div");

div.innerHTML = string; // get the browser to parse the html

var children = div.getElementsByTagName("*");

for (var i = 0; i < children.length; i++)
{
    if (children[i].id == "auth")
    {
        alert(children[i].textContent);
    }
}

If you use a library like jQuery, you could hide the for loop and replace the use of textContent with something cross-browser.

Upvotes: 2

AlexV
AlexV

Reputation: 23108

No need of regular expressions to do this. Use the DOM instead.

var obj = document.getElementById('auth');
if (obj)
{
    alert(obj.innerHTML);
}

By the way, having multiples id with the same value in the same page is invalid (and will surely result in odd JS behavior).

If you want to have many auth on the same page use class instead of id. Then you can use something like:

//IIRC getElementsByClassName is new in FF3 you might consider using JQuery to do so in a more "portable" way but you get the idea...
var objs = document.getElementsByClassName('auth');
if (objs)
{
    for (var i = 0; i < objs.length; i++)
        alert(obj[i].innerHTML);
}

EDIT: Since you want to parse a string that contain some HTML, you won't be able to use my answer as-iis. Will your HTML string contain a whole HTML document? Some part? Valid HTML? Partial (broken) HTML?

Upvotes: 2

Sarfraz
Sarfraz

Reputation: 382806

What I want to do (with javascript) is basically extract out the 'John Doe' from any tag with the ID of "auth".

You can't have the same id (auth) for more than one element. An id should be assigned once per element per page.

If, however, you assign a class of auth to elements, you can go about something like this assuming we are dealing with paragraph elements:

// find all paragraphs
var elms = document.getElementsByTagName('p');

for(var i = 0; i < elms.length; i++)
{
  // find elements with class auth
  if (elms[i].getAttribute('class') === 'auth') {
    var el = elms[i];

    // see if any paragraph contains the string
    if (el.innerHTML.indexOf('John Doe') != -1) {
      alert('Found ' + el.innerHTML);
    }
  }
}

Upvotes: 0

AKX
AKX

Reputation: 169184

Perhaps something like

document.getElementById("auth").innerHTML.replace(/<^[^>]+>/g, '')

might work. innerHTML is supported on all modern browsers. (You may omit the replace if you don't care about removing HTML bits from the inner content.)

If you have jQuery at your disposal, just do

$("#auth").text()

Upvotes: 0

Related Questions