Gautam
Gautam

Reputation: 1740

Jquery Remove MS word format from text area

I am trying to remove MSWord formatting information from my text area but not getting idea how to do this. The situation is like I need to copy paste some content from MSWord into a textbox editor. It gets copied well but the issue is that all the formatting also gets copied and so my 300 character sentence expands to 20000 character formatted sentence. Can any one suggest me what to do?

Ok with some R&D done I have reached a certain stage.

Here's the text that I copied from Word document

Once the user clicks on the Cancel icon for a transaction on the Status of Business, and the transaction is eligible for cancellation, a new screen titled “Cancel Transaction” will appear, with the following fields: 

here's what I get in $("#textAreaId").val()

"

  Normal
  0




  false
  false
  false

  EN-US
  X-NONE
  X-NONE




























Once the user clicks on the Cancel icon for a
transaction on the Status of Business, and the transaction is eligible for
cancellation, a new screen titled “Cancel Transaction” will appear, with the
following fields: 



 /* Style Definitions */
 table.MsoNormalTable
    {mso-style-name:"Table Normal";
    mso-style-parent:"";
    line-height:115%;
    font-:11.0pt;"Calibri","sans-serif";
    mso-bidi-"Times New Roman";}

"

Upvotes: 1

Views: 4406

Answers (1)

Gautam
Gautam

Reputation: 1740

I finally found the solution here is it

// removes MS Office generated guff
function cleanHTML(input) {
  // 1. remove line breaks / Mso classes
  var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g; 
  var output = input.replace(stringStripper, ' ');
  // 2. strip Word generated HTML comments
  var commentSripper = new RegExp('<!--(.*?)-->','g');
  var output = output.replace(commentSripper, '');
  var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
  // 3. remove tags leave content if any
  output = output.replace(tagStripper, '');
  // 4. Remove everything in between and including tags '<style(.)style(.)>'
  var badTags = ['style', 'script','applet','embed','noframes','noscript'];

  for (var i=0; i< badTags.length; i++) {
    tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
    output = output.replace(tagStripper, '');
  }
  // 5. remove attributes ' style="..."'
  var badAttributes = ['style', 'start'];
  for (var i=0; i< badAttributes.length; i++) {
    var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
    output = output.replace(attributeStripper, '');
  }
  return output;
}

Upvotes: 7

Related Questions