Reputation: 3
I am trying to convert many old .DOC files to either PDF format or RTF format. Thus far I have found one that accomplishes the latter (conversion to RTF), however the formatting from the old Word application is still present in the documents. If you open Microsoft Word (I am using 2010) and click File > Open, there is a dropdown menu that allows you to select "Recover Text From Any File(.)". Is it possible to use this in the conversion process to filter out the formatting data in the .DOC documents? Below are a couple of examples of the script I am presently trying to modify:
This one has worked though it appears to only be appending .rtf to the end of the file rather than changing the format:
Sub SaveAllAsDOCX()
Dim strFilename As String
Dim strDocName As String
Dim strPath As String
Dim oDoc As Document
Dim fDialog As FileDialog
Dim intPos As Integer
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
.Title = "Select folder and click OK"
.AllowMultiSelect = False
..InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
strPath = fDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
If Documents.Count > 0 Then
Documents.Close SaveChanges:=wdPromptToSaveChanges
End If
If Left(strPath, 1) = Chr(34) Then
strPath = Mid(strPath, 2, Len(strPath) - 2)
End If
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = Documents.Open(strPath & strFilename)
strDocName = ActiveDocument.FullName
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
strDocName = strDocName & ".docx"
oDoc.SaveAs FileName:=strDocName, _
FileFormat:=wdFormatDocumentDefault
oDoc.Close SaveChanges:=wdDoNotSaveChanges
strFilename = Dir$()
Wend
End Sub
This one has not been successful so far in any conversions:
Option Explicit
Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007
Dim fs As Object
Dim oFolder As Object
Dim tFolder As Object
Dim oFile As Object
Dim strDocName As String
Dim intPos As Integer
Dim locFolder As String
Dim fileType As String
On Error Resume Next
locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs")
Select Case Application.Version
Case Is < 12
Do
fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
Case Is >= 12
Do
fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
End Select
Application.ScreenUpdating = False
Set fs = CreateObject("Scripting.FileSystemObject")
Set oFolder = fs.GetFolder(locFolder)
Set tFolder = fs.CreateFolder(locFolder & "Converted")
Set tFolder = fs.GetFolder(locFolder & "Converted")
For Each oFile In oFolder.Files
Dim d As Document
Set d = Application.Documents.Open(oFile.Path)
strDocName = ActiveDocument.Name
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
ChangeFileOpenDirectory tFolder
Select Case fileType
Case Is = "TXT"
strDocName = strDocName & ".txt"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
Case Is = "RTF"
strDocName = strDocName & ".rtf"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
Case Is = "HTML"
strDocName = strDocName & ".html"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
Case Is = "PDF"
strDocName = strDocName & ".pdf"
' *** Word 2007 users - remove the apostrophe at the start of the next line ***
'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF
End Select
d.Close
ChangeFileOpenDirectory oFolder
Next oFile
Application.ScreenUpdating = True
End Sub
Upvotes: 0
Views: 1583
Reputation: 120
I will cover one way, using a VBA script, to do what you want, without having to use Word's built-in "Recover text from any file" mode functionality.
It converts every .doc/.docx in one directory to .txt, but can be used to convert to any other format supported by the parent Application (I tested with Word 2010). As follows:
'------------ VBA script start -------------
Sub one1()
Set fs = CreateObject("Scripting.FileSystemObject")
Set list1 = fs.GetFolder(ActiveDocument.Path)
For Each fl In list1.files
If InStr(fl.Type, "Word") >= 1 And Not fl.Path = ActiveDocument.Path & "\" & ActiveDocument.Name Then
Set wordapp = CreateObject("word.Application")
Set Doc1 = wordapp.Documents.Open(fl.Path)
'wordapp.Visible = True
Doc1.SaveAs2 FileName:=fl.Name & ".txt", fileformat:=wdFormatText
wordapp.Quit
End If
Next
End Sub
'------------ VBA script start -------------
to save as PDF, use
Doc1.SaveAs2 FileName:=fl.Name & ".pdf", fileformat:=wdFormatPDF
instead
to save as RTF, use
Doc1.SaveAs2 FileName:=fl.Name & ".rtf", fileformat:=wdFormatRTF
instead
or, say, HTML:
Doc1.SaveAs2 FileName:=fl.Name & ".html", fileformat:=wdFormatHTML
and so on.
Some drawbacks that I didn't bother checking, because they are inoffensive:
at the end of execution a error message pops up, but with no consequence whatsoever.
it tries to open itself, since it's a VBA script inside a document itself, and it's a document opener script. And then you will have to instruct 'him' to open it manunally read-only when a message pops up.
it will save all documents into C:\users\username\Documents , instead of the one where it was executed from, what would be better in most instances.
slow process, expect a 2-3 documents/second speed in most ordinary personal computers.
Upvotes: 1