Reputation: 868
I have an HTML page (created from drupal) that, near the top of the page at a place of my choosing, has
<span class="marker-start"></span>
and near the end, at a place of my choosing, has
<span class="marker-end"></span>
In between those is some HTML written by users that will probably but not definitely be well formed.
The user can add additional tags like the above, so as to exclude content, eg:
<span class="marker-end"></span>
<div>This HTML here is excluded</div>
<span class="marker-start"></span>
Note that the exclusion block begins with a 'marker-end', as that matches the 'marker-start' at the beginning of the page to form a pair, and similarly the exclusion block ends with a 'marker-start' to pair with the 'marker-end' at the end of the document (or the start of another exclusion block).
While theoretically that exclusion block will be well formed, I will say again: written by users. Tags may legitimately be opened or closed in an uneven way (for example, the /div may be AFTER the marker-start), and so on. Basically, there is no guarantee that the markers will be siblings.
The user can add multiple excluded spans within the document.
I need a way to read the text (NOT the HTML) between each pair of 'marker-start' and 'marker-end', and that text (which will exclude any exclusion blocks) will be concatenated together. The markers may not (in fact almost certainly will not) be siblings in a balanced position, ie there will probably be tags that are opened but not closed, or vice versa, between them.
I have tried the methods suggested in How to select all content between two tags in jQuery and Get text between two elements JQUERY and hit problems on all of them.
In general, I have really struggled to have jQuery produce any useful results.
Can anyone suggest the simplest method to achieve this? I do have two solutions which I will outline in an answer for others to see but neither is perfect.
Upvotes: 0
Views: 204
Reputation: 93631
You could try walking the entire DOM, recursively, and exclude elements based on prior start and end markers found:
As a simple example (if I understand your exclusion logic correctly):
JSFiddle: http://jsfiddle.net/fdductdg/2/
function walkDOM(node, func) {
func(node);
node = node.firstChild;
while (node) {
walkDOM(node, func);
node = node.nextSibling;
}
};
var inMarker = false;
walkDOM(document.body, function (node) {
var $node = $(node);
if ($node.is('span')) {
if ($node.hasClass('marker-end')) {
inMarker = false;
console.log("end marker");
} else if ($node.hasClass("marker-start")) {
inMarker = true;
console.log("start marker");
}
}
if (node.nodeType == 3)
{
if (!inMarker)
{
// Not inside a marker, remove the text content
node.textContent = "";
}
}
});
Update:
As you also wish to retain the original text, you can either collect it in a variable (as you appear to have done in comment) or wrap any matching text nodes in appropriate elements (e.g. a span with appropriate class) so that the excluded text can simply be styled-in/out, without destroying the content.
Upvotes: 1
Reputation: 868
One really bad option would be to get the HTML as a string, and then go through using string analysis, find the markers, grab the HTML between them, and then use some kind of HTML parser to reduce that to text. Yuck!
A better solution I found was:
1) I added unique ids to the page's outermost opening and closing markers (the ones I control), eg
<span class="marker-start" id="primary-marker-start"></span>
...
<span class="marker-end" id="primary-marker-end"></span>
2) I used the following to get the text:
var start_class = 'marker-start';
var end_class = 'marker-end';
var start_tag = '<start>';
var end_tag = '<end>';
var absolute_start_id = "#primary-marker-start";
var absolute_end_id = "#primary-marker-end";
// put convenient markers into the actual text that will be returned,
// to enable simple parsing - note that this will dump anything already there
// so for example, <span class="marker-start"></span>
// becomes <span class="marker-start"><start></span>
jQuery("." + start_class).text(start_tag);
jQuery("." + end_class).text(end_tag);
// get the text between the two outermost markers -
// including the convenient markers added above
var content = start_tag + jQuery(absolute_start_id).nextAll().not(absolute_end_id).text();
// remove the convenient markers so they don't show up on the page
jQuery("." + start_class).text("");
jQuery("." + end_class).text("");
// at this point, content holds all the text
// between and including absolute_start_id and absolute_end_id,
// with start_tag in place of the start markers, (eg '<start>')
// and end_tag in place of the end markers
// (including at the beginning and end of the text)
After this it is a relatively trivial act to process that string and remove anything between end and start markers appropriately, and so on.
Can anyone suggest a better idea or ways to improve on this? I am not a jQuery expert so would welcome tips or solutions.
Upvotes: 0