Reputation: 1368
I want a user to be able to upload a word document and my program then parses the document into separate word documents. The problem is that the splitting will need to be manual as all the word documents are not formatted the same way. My initial thought is before the user uploads, the user tags the sections with a beginning and end tag (of some sort maybe a comment) that my program can then parse and split the document into separate documents. (This also needs to work for .doc and .docx so a common solution is desirable)
Ex. Input:
Doc1
Chapter 1
Blah Blah Blah
Chapter 2
Blah blah
/end Doc1
Ex. Output:
Doc1
Chapter 1
Blah Blah Blah
/end Doc1
Doc 2
Chapter 2
Blah blah
/end Doc2
Any ideas? I have been struggling with this for awhile
Upvotes: 3
Views: 3311
Reputation: 1
VBA Macro to split files into sub documents
Sub UpdateDocuments()
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
With wdDoc
'Call your other macro or insert its code here
'BreakOnSection
wdDoc.Activate
ActiveDocument.ActiveWindow.View.Type = wdOutlineView
Selection.WholeStory
Selection.Copy
ActiveDocument.Subdocuments.AddFromRange Range:=Selection.Range
ActiveDocument.SaveAs "C:\Data\Split\" & ActiveDocument.Name
ActiveDocument.Close SaveChanges:=True
End With
strFile = Dir()
Wend
Set wdDoc = Nothing
Application.ScreenUpdating = True
End Sub
Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0,
"Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function
Upvotes: 0
Reputation: 2915
I've had great success with Aspose.Words for document manipulation and generation.
Upvotes: 0
Reputation: 3416
Something that may help is HTML Transit. It's incredibly old software and incredibly expensive, and from an initial search, it may not be supported anymore. But, it did have the ability to take one Word document, and split it up into smaller pieces (of course, it converted it to HTML as well). Something to look into, maybe. Google "HTML Transit" for more research and free demo.
Upvotes: 0
Reputation: 2714
What you want to do is non-trivial! I have done my fair share of document manipulation, that said if you are working with a DOCX these days it is not too bad due to the supporting libraries, see:
Older version get more difficult, you would need to source a library for that, or as suggested use macros.
Is the "program" a web site? If so make sure you do not use COM interop!
Upvotes: 4
Reputation: 8336
I'd say your best bet is to investigate the VSTO or VBA macros to accomplish this. Both will give you full access to the object model in whatever version the document is.
Upvotes: 0