Reputation: 4616
At the moment i am working on text that is broken into floating columns to display it in a magazine-like
way.
I asked in a previous question how to split
the text into sentences and it works like a charm:
sentences = text.replace(/\.\s+/g,'.|').replace(/\?\s/g,'?|').replace(/\!\s/g,'!|').split("|");
Now i want to go a step further and split it into words. But i do also have some elements in it, that should not be splitted. Like subheadlines.
An example text would be:
A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.
My desired result would look like the following:
Array [
"A",
"wonderful",
"serenity",
"has",
"taken",
"possession",
"of",
"my",
"entire",
"soul.",
"<strong>This is a subheadline</strong>",
"<br>",
"<br>",
"I",
"am",
"alone,",
"and",
"feel",
"the",
"charm",
"of",
"existence",
"in",
"this",
"spot."
]
When i split at all whitespaces i do get the words, but the "<br>"
won't be added as a new array entry. I also don't want to split the subheadline and markup.
The reason why i want to do this, is that i add sequence after sequence to a p-tag and when the height gets bigger than the surrounding element i remove the last added sequence and create a new floating p-tag. When i splitted it into sentences i saw, that the breakup was not good enough to ensure a good reading flow.
An example what i try to achieve can you see here
If you need any further information i will be glad to give it to you.
Thanks in advance,
Tobias
EDIT
The string could contain more html tags in the future. Is there a way to not touch anything between these tags?
EDIT 2
I created a jsfiddle: http://jsfiddle.net/m9r9q/1/
EDIT 3
Would it be a good idea to remove all html tags with encapsulated text and replace it with placeholders? Then split the string into words and add the untouched html-tags when the placeholder is reached? What would be the regex to extract all html tags?
Upvotes: 0
Views: 11466
Reputation: 5343
Although i want to try to extract the html parts and add them afterwards untouched
Forget about it and about my previous post. I just got an idea that it's much better to use built in browser engine to operate on html code.
You can just use this:
var text = 'A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.';
var elem = document.createElement('div');
elem.innerHTML = text;
var array = [];
for(var i = 0, childs = elem.childNodes; i < childs.length; i ++) {
if (childs[i].nodeType === 3 /* document.TEXT_NODE */) {
array = array.concat(childs[i].nodeValue.trim().split(/\s+/));
} else {
array.push(childs[i].outerHTML);
}
}
It DOES support nested tags this time, also it supports all possible syntax without hard-coded exceptions for non closable tags :)
Upvotes: 3
Reputation: 5343
As I stated before in comment - you shouldn't do this. But if you insist - here's a possible answer:
var text = 'A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.';
var array = [],
tagOpened = false,
stringBuilder = [];
text.replace(/(<([^\s>]*)[^>]*>|\b[^\s<]*)\s*/g, function(all, word, tag) {
if (tag) {
var closing = tag[0] == '/';
if (closing) {
stringBuilder.push(all);
word = stringBuilder.join('');
stringBuilder = [];
tagOpened = false;
} else {
tagOpened = tag.toLowerCase() != 'br';
}
}
if (tagOpened) {
stringBuilder.push(all);
} else {
array.push(word);
}
return '';
});
if (stringBuilder.length) array.push(stringBuilder.join(''));
It doesn't support nested tags. You can add this functionality by implementing a stack for your opened tags
Upvotes: 3