Reputation: 186
All I have found that works at the moment is using spaces to match on. I would like to be able to match arbitrary HTML tags and punctuation.
var text = "<div>The Quick brown fox ran through it's forest darkly!</div>"
//this one uses spaces only but will match "darkly!</div>" as 1 element
console.log(text.match(/\S+/g));
//outputs: ["<div>The", "Quick", "brown", "fox", "ran", "through", "it's", "forest", "darkly!</div>"]
I want a matching expression that will output:
["<div>", "The", "Quick", "brown", "fox", "ran", "through", "it's", "forest", "darkly", "!", "</div>"]
Here is a fiddle: https://jsfiddle.net/scottpatrickwright/og0bd0xj/2/
Ultimately I am going to store all of the matches in an array, do some processing (add some span tags with a conditional data attribute around every whole word) and re-output the original string in an altered form. I mention this as solutions which don't leave the string more or less intact wouldn't work.
I am finding lots of near miss solutions online however my regex is not good enough to take advantage of their work.
Upvotes: 0
Views: 60
Reputation: 46
My suggestion would be:
console.log(text.match(/(<.+?>|[^\s<>]+)/g));
Where in our regex: (<.+?>|[^\s<>]+)
we specify two strings to catch
<.+?> returns all <text> strings
[^\s<>]+ returns all strings that don't contain space,<,>
in the secound one you could add charatcters you want to ignore
Upvotes: 0
Reputation: 4277
How about:
/(<\/?)?[\w']+>?|[!\.,;\?]/g
Demonstrated here.
Upvotes: 2
Reputation: 8246
You could just add a space before and after the HTML tags like so:
var text = "<div>The Quick brown fox ran through it's forest darkly!</div>"
text = text.replace(/\<(.*?)\>/g, ' <$1> ');
console.log(text.match(/\w+|\S+/g)); // ## Credit to George Lee ##
Upvotes: 0