pri_dev
pri_dev

Reputation: 11645

regular expression to remove all tags with content and html code from a string

I am looking to develop a regular expression which remove all html tags with the names , script tags, all content in the script tag (basically all javascript code), and any html code like   etc just no html or javascript code in the the string should pass. Update:

I think the questioned was not so clear may be this should be more clear.

i want the '<' and '>' to be NOT allowed in the string along with any special characters like ;,# ... etc. I dont care if there is a tag like "<html>" or "<body> "etc" I just want to return false so that user cannot enter any tag at all, also I want to block all javascript so I am assuming if I dont allow the <,> the script tag wont pass and js code wont pass?

So the regex should just not allow inclusion of any <, > and other special charaters like ;#@$%& etc so that other html code apart from tags is also blocked ... e.g &nbsp;

Upvotes: 0

Views: 1523

Answers (4)

Fiaz Khattak
Fiaz Khattak

Reputation: 1

Regex.Replace(html, @"]>[\s\S]?|<[^>]+>", "", RegexOptions.IgnoreCase).Trim();

here html is a string having the html of a page from which it need to remove html and script tags

Upvotes: 0

Paco
Paco

Reputation: 508

^[^<>;#]*$

if string matches that regex it doesn't contains the characters in brackets. I hope I understand your question well.

Upvotes: 1

Ahmed Hashem
Ahmed Hashem

Reputation: 360

For validating if an HTML element or a String contains HTML tags, check the following JavaScript function :

function containsHTMLTags(str)
{
        if(str.match(/([\<])([^\>]{1,})*([\>])/i)==null)
         return false;
        else
         return true;
}

The function uses black-list filtering.

References : http://www.hscripts.com/scripts/JavaScript/html-tag-validation.php

Upvotes: 1

alex
alex

Reputation: 490163

Don't use a regular expression for that.

You can't use textContent or innerText because at least the former returns the body of script elements.

If I was only supporting newer browsers and had access to (or shimmed) Array.prototype.indexOf(), Array.prototype.reduce() and Array.prototype.map(), here is what I might use...

var getText = function me(node, excludeElements) {

    if (!excludeElements instanceof Array) {
        excludeElements = [];
    } else {
        excludeElements.map(function(element) {
            return element.toLowerCase();
        });
    }

    return [].slice.call(node.childNodes).reduce(function(str, node) {
        var nodeType = node.nodeType;
        switch (nodeType) {
        case 3:
            return str + node.data;
        case 1:
            if (excludeElements.indexOf(node.tagName.toLowerCase()) == -1) {
                return str + me(node, excludeElements);
            }
        }
        return '';
    }, '');

}

jsFiddle.

Upvotes: 0

Related Questions