user208662
user208662

Reputation: 10997

JavaScript - Detect HTML

I have an HTML textarea element. I want to prevent a user from entering any HTML tags in this area. How do I detect if a user has entered any HTML a textarea with JavaScript?

Thank you

Upvotes: 0

Views: 8110

Answers (5)

Adrian Maire
Adrian Maire

Reputation: 14875

Initial considerations:

  • XML != HTML, so I will consider that html are not allowed, but XML it is.
  • All html tags should be deleted, not just escaped (escape html is much easier).
  • We don't want that the user lose the position of his pointer while he is writting (that is very anoying).

First of all, define a function to replace html tags by '':

/**
* This function delete html tags from a text, even if the html tag is 
* not well formed.
* This function update the pointer position to maintain it after the replacement.
* @param {string} text The text to modify
* @param {int} initPos The current position of the pointer in the text 
* @return {int} The new pointer position
*/
function removeHtmlTags( text, initPos )
{
    // Define the regex to delete html tags
    if (undefined===removeHtmlTags.htmlTagRegexp)
    {
        removeHtmlTags.htmlTagRegexp = new RegExp('</?(?:article|aside|bdi|command|'+
            'details|dialog|summary|figure|figcaption|footer|header|hgroup|mark|'+
            'meter|nav|progress|ruby|rt|rp|section|time|wbr|audio|'+
            'video|source|embed|track|canvas|datalist|keygen|output|'+
            '!--|!DOCTYPE|a|abbr|address|area|b|base|bdo|blockquote|body|'+
            'br|button|canvas|caption|cite|code|col|colgroup|dd|del|dfn|div|'+
            'dl|dt|em|embed|fieldset|figcaption|figure|footer|form|h1|h2|h3|h4|'+
            'h5|h6|head|hr|html|i|iframe|img|input|ins|kdb|keygen|label|legend|'+
            'li|link|map|menu|meta|noscript|object|ol|optgroup|option|p|param|'+
            'pre|q|s|samp|script|select|small|source|span|strong|style|sub|'+
            'sup|table|tbody|td|textarea|tfoot|th|thead|title|tr|u|ul|var|'+
            'acronym|applet|basefont|big|center|dir|font|frame|'+
            'frameset|noframes|strike|tt)(?:(?: [^<>]*)>|>?)', 'i');
    }

    // Delete html tags
    var thereIsMore=true;
    removeHtmlTags.htmlTagRegexp.lastIndex=0;
    // While I am not sure that all html tags are removed.
    while (thereIsMore)
    {
        var str = text.match(removeHtmlTags.htmlTagRegexp);
        if ( str!=null) // There is a match
        {
            text = text.replace(str[0], '');
            // Update the position
            if (str.index < initPos) 
                initPos= Math.max(initPos-str[0].length,str.index);
        }
        else thereIsMore = false;
    }

    // If getCaretPosition fail, the initPos may be negative
    if (initPos<0) initPos=0;

    return {text: text, pos: initPos};
}

Notes: I decided following replacements e.g.:

'<div>' -> ''
'<div selected' -> ' selected'
'<div selected>' -> ''
'<div    >' -> ''

Second, we need a function to get/set the carret position, because on updating the textarea content, it will reset. Further more, the position may change if any tag is deleted before the carret position.

/**
 * This function get/set the position of the carret in a node.
 * If the value is set, this function try to set the new position value.
 * Anyway, it return the (new) position.
 * @param {Element} node The textarea element
 * @param {int} value The new carret position
 * @return {int} The (new) carret position 
 */
function caretPosition(node, value) 
{
    // Set default Caret pos, will be returned if this function fail.
    var caretPos = 0;

    // Ensure that value is valid
    value = parseInt(value);

    // Set the new caret position if necesary
    if (!isNaN(value)) // We want to set the position
    {
        if (node.selectionStart)
        {
            node.selectionStart=value;
            node.selectionEnd= value;
        }
        else if(node.setSelectionRang)
        {
            node.focus();
            node.setSelectionRange(value, value);
        }
        else if (node.createTextRange)
        {
            var range = node.createTextRange();
            range.collapse(true);
            range.moveEnd('character', value);
            range.moveStart('character', value);
            range.select();
        }
    }

    // Get the position to return it.
    if (node.selectionStart) return node.selectionStart;
    else if (document.selection)
    {
        node.focus();
        var sel = document.selection.createRange();
        sel.moveStart('character', -node.value.length);
        caretPos = sel.text.length;
    }

    return caretPos;
}

Third, create a main function to remove html tags from the textarea and set the carret position.

/**
 * This event function remove html tags from the textarea with id=text 
 */
function updateText()
{
    // Get the textarea
    var t = document.getElementById('text');

    // Get the caret position
    var pos = caretPosition(t);

    // Remove html from the text
    var result = removeHtmlTags(t.value, pos);
    t.value = result.text;

    // Set the new caret position
    caretPosition(t, result.pos);
}

Finally, add event listeners to update the textarea on modification:

  • Key press
  • Past
  • Drop

We should be able to use "oninput" for all 3 events, but (ofc) IE fail.

HTML:

<html>
    <head>
        <script type="text/javascript">
           <!-- Copy all the js code here. -->
        </script>
    </head>
    <body>
        <textarea cols="50" rows="10" oninput="updateText();" 
            ondrop="setTimeout('updateText();',0);" 
            onpaste="setTimeout('updateText();',0);" 
            onkeyup="updateText();" id='text'></textarea>
    </body>
</html>

I hope it help you :-) Escain

Upvotes: 3

BalusC
BalusC

Reputation: 1109865

One of the ways is to let the keypress event return false when the pressed key matches < or >. To distinguish real HTML tags from innocent "lesser than" and "greater than" signs, you may need to put some regex in. And since you can't parse HTML reliably with regex... There's however a jQuery way:

var sanitized = $('<div>').html(textareavalue).text();

The normal practice is however to just let the client enter whatever it want and sanitize HTML during display by the server side view technology in question. How to do it depends on the view technology you're using. In for example PHP you can use htmlspecialchars() for this and in JSP/JSTL the fn:escapeXml(). This is more robust since Javascript can be disabled/hacked/spoofed by the client.

Upvotes: 6

Amy B
Amy B

Reputation: 17977

What can you consider as HTML tags? Is <b> a tag? What about the middle characters in I <3 how 5 is > 4?

I think you should not limit users with your strictness. Don't be a Steve Jobs.

Upvotes: 1

Robusto
Robusto

Reputation: 31913

You can use a regular expression, like

if ( textArea.value.match(/<\/*[a-z][^>]+?>/gi) ) {
  // do something about it
}

where "textArea" is the ID of your textarea element.

Upvotes: 2

chris
chris

Reputation: 10003

firstly, bear in mind that you'll need to re-validate on the server side, since anyone can fake a http post, and if they have javascript disabled then of course you have no control :)

what i'd do is

<textarea onkeypress="disableHtml(this);" name="text"></textarea>

and for the javascript

function disableHtml(element) {
  element.value = element.value.replace(/[<>]/g, '');
}

another way to do this would be to replace < and > with &lt; and &gt; on the server side, which is the better way because it's secure and people can still frown >:)

[edit : you can make the regexp as clever as you like, if you want to only detect certain tags for instance]

Upvotes: -3

Related Questions