Yasser
Yasser

Reputation: 315

Delete blank pages from WORD open XML

I have successfully generated a word document file using open XML, but I have got too many blank pages, how can i remove them ?

Upvotes: 3

Views: 4155

Answers (1)

Avi Shmidman
Avi Shmidman

Reputation: 940

This depends on how those blank pages are represented in the Open XML; you may want to post a sample document to demonstrate exactly how your blank pages are represented.

But let's take the case of a Word document in which a user has inserted extra page breaks (by hitting ctrl-enter in Word), resulting in blank pages. These page breaks will be represented in the XML as:

<w:br w:type="page"/>  

The page will still have plenty of tags in it for spacing, fonts, etc.; and the page may display header and footers, too. But let's define a blank page as one which has no new paragraph text. In Open XML, new text is displayed with a w:t tag.

So, in order to remove blank pages created by extra page breaks with no text in between, we can run the following regular expression on the XML document, replacing with blank (""):

<w:br w:type="page"/>(.(?!<w:t>))*(?=<w:br w:type="page"/>)

This regex will search for a series of two or more page breaks with no new text in between, removing all but the last one.

(Note that this won't take care of blank pages at the end of the document, which is a bit trickier. Additionally, if you'd like to account for pages with images, textboxes, etc., the regex will have to be expanded to include the relevant items).

Upvotes: 1

Related Questions