Mike
Mike

Reputation: 117

Regex Get Words Between HTML Tags

I have this string :

<p><ins>Article </ins>Title</p> 

<p>Here&#39;s some sample text</p>

I'd like to get words neglecting html tags to array, ie

['Article','Title','Here&#39;s','some','sample','text']

I tried to create a regex, but it wont succeed. Thanks in advance.

Upvotes: 0

Views: 579

Answers (2)

klugjo
klugjo

Reputation: 20885

You don't need a regex for this, you can simply use the browser's API:

const html = "<p><ins>Article </ins>Title</p> <p>Here&#39;s some sample text</p>";
const div = document.createElement("div");
div.innerHTML = html;

// This will extract the text (remove the HTML tags)
const text = div.textContent || div.innerText || "";
console.log(text);

// Then you can simply split the string
const result = text.split(' ');
console.log(result);

Upvotes: 3

gurvinder372
gurvinder372

Reputation: 68433

Put them in a dummy div and get innerText

var str = `<p><ins>Article </ins>Title</p> 
<p>Here&#39;s some sample text</p>`;

var div = document.createElement( "div" );
div.innerHTML = str; //assign str as innerHTML
var text = div.innerText; //get text only

var output = text.split( /\s+/ ); //split by one or more spaces including line feeds
console.log( output );

Upvotes: 5

Related Questions