Reputation: 13110
I am trying to do a word-count of a textarea that accepts HTML input.
My first step is to strip tags from the input. I have found this code from another question:
$("<div></div>").html(html).text();
Which works great, but is vulnerable to script tags in the html:
html = "<script>alert()";
I am trying to mitigate this by using:
$("<p>").html(html).remove('script').text();
Which successfully handles the example above. Unfortunately it doesn't handle:
html = "<script><script>alert();</script>";
As it only removes the outer script.
I'm trying to write a while loop to continually remove scripts until there are none left to remove, but I'm struggling with the logic.
I want something like this:
var $div = $("<div></div>").html(html);
while(*remove script causes a change*){
$div = $div.remove('script');
}
text = $div.text();
Is this possible? And is this safe?
Is there any way to handle onXXX=""
attributes in other elements too?
Upvotes: 1
Views: 1381
Reputation: 13110
I settled on using the phpjs version of the php function strip_tags, which appears to be working nicely and handling script tags well.
My simplistic word count function so far is:
$('#input').on('input',function(){
var text = $(this).val();
text = strip_tags(text).replace(/\s+/g, ' ').trim();
var wordCount = 0;
if(text != ''){
var words = text.split(' ');
wordCount = words.length;
}
$('#word-count').html(wordCount);
});
Upvotes: 1
Reputation: 91
You can use this regular expression:
var regex = /(<([^>]+)>)/ig
var body = "<p>test</p>"
var result = body.replace(regex, "");
alert(result);
Found an another answer on StackOverflow: How to strip HTML tags from div content using Javascript/jQuery?
Please sanitize the string before saving into the database.
Upvotes: 3