User
User

Reputation: 3782

Need Pure/jQuery Javascript Solution For Cleaning Word HTML From Text Area

I know this issue has been touched on here but I have not found a viable solution for my situation yet, so I'd like to but the brain trust back to work and see what can be done.

I have a textarea in a form that needs to detect when something is pasted into it, and clean out any hidden HTML & quotation marks. The content of this form is getting emailed to a 3rd party system which is particularly bitchy, so sometimes even encoding it to the html entity characters isn't going to be a safe bet.

I unfortunately cannot use something like FCKEditor, TinyMCE, etc, it's gotta stay a regular textarea in this instance. I have attempted to dissect FCKEditor's paste from word function but have not had luck tracking it down.

I am however able to use the jQuery library if need be, but haven't found a jQuery plugin for this just yet.

I am specifically looking for information geared towards cleaning the information pasted in, not how to monitor the element for change of content.

Any constructive help would be greatly appreciated.

Upvotes: 2

Views: 10294

Answers (5)

zilverdistel
zilverdistel

Reputation: 11

It might be useful to use the blur event which would be triggered less often:

$("textarea").blur(function() {
    // check input ($(this).val()) for validity here
});

Upvotes: 1

Tim Molendijk
Tim Molendijk

Reputation: 1046

You could check out Word HTML Cleaner by Connor McKay. It is a pretty strong cleaner, in that it removes a lot of stuff that you might want to keep, but if that's not a problem it looks pretty decent.

Upvotes: 5

Shane Tomlinson
Shane Tomlinson

Reputation: 3350

I am looking at David Archer's answer and he pretty much answers it. I have used in the past a solution similar to his:

$("textarea").change( function() {
    // convert any opening and closing braces to their HTML encoded equivalent.
    var strClean = $(this).val().replace(/</gi, '&lt;').replace(/>/gi, '&gt;');

    // Remove any double and single quotation marks.
    strClean = strClean.replace(/"/gi, '').replace(/'/gi, '');

    // put the data back in.
    $(this).val(strClean);
});

If you are looking for a way to completely REMOVE HTML tags

$("textarea").change( function() {
    // Completely strips tags.  Taken from Prototype library.
    var strClean = $(this).val().replace(/<\/?[^>]+>/gi, '');

    // Remove any double and single quotation marks.
    strClean = strClean.replace(/"/gi, '').replace(/'/gi, '');

    // put the data back in.
    $(this).val(strClean);
});

Upvotes: 8

peirix
peirix

Reputation: 37731

What about something like this:

function cleanHTML(pastedString) {
    var cleanString = "";
    var insideTag = false;
    for (var i = 0, var len = pastedString.length; i < len; i++) {
        if (pastedString.charAt(i) == "<") insideTag = true;
        if (pastedString.charAt(i) == ">") {
            if (pastedString.charAt(i+1) != "<") {
                insideTag = false;
                i++;
            }
        }
        if (!insideTag) cleanString += pastedString.charAt(i);
    }
    return cleanString;
}

Then just use the event listener to call this function and pass in the pasted string.

Upvotes: 1

Dave Archer
Dave Archer

Reputation: 3060

Edited from the jquery docs..

$("textarea").change( function() {
    // check input ($(this).val()) for validity here
});

Thats for detecting the changes. The clean would probably be a regex of sorts

edited above to look for a textarea not a textbox

Upvotes: 0

Related Questions