Reputation: 1740
I am trying to remove MSWord formatting information from my text area but not getting idea how to do this. The situation is like I need to copy paste some content from MSWord into a textbox editor. It gets copied well but the issue is that all the formatting also gets copied and so my 300 character sentence expands to 20000 character formatted sentence. Can any one suggest me what to do?
Ok with some R&D done I have reached a certain stage.
Here's the text that I copied from Word document
Once the user clicks on the Cancel icon for a transaction on the Status of Business, and the transaction is eligible for cancellation, a new screen titled “Cancel Transaction” will appear, with the following fields:
here's what I get in $("#textAreaId").val()
"
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
Once the user clicks on the Cancel icon for a
transaction on the Status of Business, and the transaction is eligible for
cancellation, a new screen titled “Cancel Transaction” will appear, with the
following fields:
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-style-parent:"";
line-height:115%;
font-:11.0pt;"Calibri","sans-serif";
mso-bidi-"Times New Roman";}
"
Upvotes: 1
Views: 4406
Reputation: 1740
I finally found the solution here is it
// removes MS Office generated guff
function cleanHTML(input) {
// 1. remove line breaks / Mso classes
var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g;
var output = input.replace(stringStripper, ' ');
// 2. strip Word generated HTML comments
var commentSripper = new RegExp('<!--(.*?)-->','g');
var output = output.replace(commentSripper, '');
var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
// 3. remove tags leave content if any
output = output.replace(tagStripper, '');
// 4. Remove everything in between and including tags '<style(.)style(.)>'
var badTags = ['style', 'script','applet','embed','noframes','noscript'];
for (var i=0; i< badTags.length; i++) {
tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
output = output.replace(tagStripper, '');
}
// 5. remove attributes ' style="..."'
var badAttributes = ['style', 'start'];
for (var i=0; i< badAttributes.length; i++) {
var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
output = output.replace(attributeStripper, '');
}
return output;
}
Upvotes: 7