Reputation: 578

Getting orphaned text out of parent tag with child elements serparating

I have a programming challenge, and I'm wondering what the most bug-free way to approach it is.

Basically, I have the following HMTL:

<p id="first">
    Hello lorem ispum 
    <a id="link" href="...">Link</a> 
    linkety link blag
</p>

(the id's are for proof of concept in getting by getElementById: in reality, I get the DOM references element-by-element parsing the page).

The "Hello lorem ispum" and "linkety link blag" text fragments are orphaned -- I cannot directly access them. I can only access the whole thing with the paragraph tag, or the inside "a" tag.

What I would like is an array of elements of the stuff in the paragraph. If they need to get wrapping tags or something in order to get a reference to modify with JavaScript, that's OK. E.G., end result:

para[0] = <span>Hello lorem ispum</span>
para[1] = <a id="link" href="...">Link</a>
para[2] = <span>linkety link blag</span>

DOM Objects that I can change/access linking to what's on the page (NOT strings).

Would it just be a bunch of parsing the paragraph tag's innerHTML?

This is all for an open source Chrome plugin for disabilities in reading text by simply using up and down arrow keys. If you have any better ideas of how to approach this problem, please let me know!

Upvotes: 0

Answers (4)

spliter

Reputation: 12599

var paragraph = document.getElementById('first'),
    list = paragraph.childNodes,
    l = list.length,
    el, container, i = 0, result = [];

for(i; i < l; i++) {
    el = list[i];
    if (el.nodeType === 3) {
        container = document.createElement('span');
        container.className = 'wrapper';
        // we want to remove line breaks from the text
        container.innerText = el.nodeValue.replace(/(\r\n|\n|\r)/gm,"");
        el = container;
    }
    result.push(el);
}

JSFiddle

The reason we want to remove line breaks from the text nodes is that those will be converted into   when in a . Don't think this is what you need.

In your particular case, result will be something like:

[SPAN, LINK, SPAN]

Upvotes: 1

Musa

Reputation: 97717

Try this, it creates a span with content of the text node and replace it with the text node

var p = document.getElementById('first');    
var elements = [];    
for (var i = 0; i < p.childNodes.length; i++) {
    var child = p.childNodes[i];
    if (child.nodeType == 3) {//text node
        if (! /^\s+$/.test(child.nodeValue)){//skip whitespaces
            var span = document.createElement('span');
            span.innerHTML = child.nodeValue;
            p.replaceChild(span, child);
            elements.push(span);
        }
    }
    else if (child.nodeType == 1){//element node
        elements.push(child)
    }
}

http://jsfiddle.net/mowglisanu/t6UaJ/

Upvotes: 1

Sushanth --

Reputation: 55750

You can iterate over the childNodes

   var para = document.getElementById('first');

var arr = [];

for (var i = 0; i < para.childNodes.length; i++) {
    var elem = para.childNodes[i];
    if (elem.nodeType === 3) {
        var newElem = document.createElement('span');
        newElem.className = 'a';
        newElem.innerHTML = trim(elem.nodeValue);
        elem.parentNode.insertBefore(newElem, elem.nextSibling);
        para.removeChild(elem);
        arr.push(newElem);
    }
    else {
        arr.push(elem)
    }

}
console.log(arr);

function trim(str) {
    return str.replace(/^\s+|\s+$/g, "");
}

Check Fiddle

Upvotes: 1

jfriend00

Reputation: 708036

You can grab the text from the text nodes that aren't in other elements like this by walking the child nodes of the  tag and looking at the nodeType to see which nodes are text nodes:

var top = document.getElementById("first");
var node = top.firstChild;
while (node) {
    // get text from text nodes that aren't contained in elements
    if (node.nodeType === 3) {
        // node.nodeValue is the text in the text node
    } else if (node.nodeType === 1) {
        // node is an element here so you can get innerHTML or textContent or whatever you want
    }
    node = node.nextSibling;
}

Working demo: http://jsfiddle.net/jfriend00/YvBpw/

If you just want the plain text from the whole  tag (including all elements) and do it cross browser, you can use this:

var t = document.getElementById("first");
var text = t.textContent || t.innerText;

This will be an HTML-stripped text conversion of everything in the  tag.

Upvotes: 0

Getting orphaned text out of parent tag with child elements serparating

Answers (4)

Related Questions