user1020971
user1020971

Reputation: 3

VBA to open .doc in "Recover text from any file" mode

I am trying to convert many old .DOC files to either PDF format or RTF format. Thus far I have found one that accomplishes the latter (conversion to RTF), however the formatting from the old Word application is still present in the documents. If you open Microsoft Word (I am using 2010) and click File > Open, there is a dropdown menu that allows you to select "Recover Text From Any File(.)". Is it possible to use this in the conversion process to filter out the formatting data in the .DOC documents? Below are a couple of examples of the script I am presently trying to modify:

This one has worked though it appears to only be appending .rtf to the end of the file rather than changing the format:

Sub SaveAllAsDOCX()
Dim strFilename As String
Dim strDocName As String
Dim strPath As String
Dim oDoc As Document
Dim fDialog As FileDialog
Dim intPos As Integer
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
    .Title = "Select folder and click OK"
    .AllowMultiSelect = False
    ..InitialView = msoFileDialogViewList
    If .Show <> -1 Then
        MsgBox "Cancelled By User", , "List Folder Contents"
        Exit Sub
    End If
    strPath = fDialog.SelectedItems.Item(1)
    If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
If Documents.Count > 0 Then
    Documents.Close SaveChanges:=wdPromptToSaveChanges
End If
If Left(strPath, 1) = Chr(34) Then
    strPath = Mid(strPath, 2, Len(strPath) - 2)
End If
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
    Set oDoc = Documents.Open(strPath & strFilename)
    strDocName = ActiveDocument.FullName
    intPos = InStrRev(strDocName, ".")
    strDocName = Left(strDocName, intPos - 1)
    strDocName = strDocName & ".docx"
    oDoc.SaveAs FileName:=strDocName, _
        FileFormat:=wdFormatDocumentDefault
    oDoc.Close SaveChanges:=wdDoNotSaveChanges
    strFilename = Dir$()
Wend
End Sub

This one has not been successful so far in any conversions:

Option Explicit
Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007
    Dim fs As Object
    Dim oFolder As Object
    Dim tFolder As Object
    Dim oFile As Object
    Dim strDocName As String
    Dim intPos As Integer
    Dim locFolder As String
    Dim fileType As String
    On Error Resume Next
    locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs")
    Select Case Application.Version
        Case Is < 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
        Case Is >= 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
    End Select
    Application.ScreenUpdating = False
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.CreateFolder(locFolder & "Converted")
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    For Each oFile In oFolder.Files
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        ChangeFileOpenDirectory tFolder
        Select Case fileType
        Case Is = "TXT"
            strDocName = strDocName & ".txt"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
        Case Is = "RTF"
            strDocName = strDocName & ".rtf"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
        Case Is = "HTML"
            strDocName = strDocName & ".html"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
        Case Is = "PDF"
            strDocName = strDocName & ".pdf"

            ' *** Word 2007 users - remove the apostrophe at the start of the next line ***
            'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF

        End Select
        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    Application.ScreenUpdating = True
End Sub

Upvotes: 0

Views: 1583

Answers (1)

Alexandre
Alexandre

Reputation: 120

I will cover one way, using a VBA script, to do what you want, without having to use Word's built-in "Recover text from any file" mode functionality.

It converts every .doc/.docx in one directory to .txt, but can be used to convert to any other format supported by the parent Application (I tested with Word 2010). As follows:

'------------ VBA script start -------------
Sub one1()
Set fs = CreateObject("Scripting.FileSystemObject")
Set list1 = fs.GetFolder(ActiveDocument.Path)
For Each fl In list1.files
  If InStr(fl.Type, "Word") >= 1 And Not fl.Path = ActiveDocument.Path & "\" & ActiveDocument.Name Then
    Set wordapp = CreateObject("word.Application")
    Set Doc1 = wordapp.Documents.Open(fl.Path)
    'wordapp.Visible = True
    Doc1.SaveAs2 FileName:=fl.Name & ".txt", fileformat:=wdFormatText
    wordapp.Quit
  End If
Next
End Sub
'------------ VBA script start -------------

to save as PDF, use

Doc1.SaveAs2 FileName:=fl.Name & ".pdf", fileformat:=wdFormatPDF

instead

to save as RTF, use

Doc1.SaveAs2 FileName:=fl.Name & ".rtf", fileformat:=wdFormatRTF 

instead

or, say, HTML:

Doc1.SaveAs2 FileName:=fl.Name & ".html", fileformat:=wdFormatHTML

and so on.

Some drawbacks that I didn't bother checking, because they are inoffensive:

  • at the end of execution a error message pops up, but with no consequence whatsoever.

  • it tries to open itself, since it's a VBA script inside a document itself, and it's a document opener script. And then you will have to instruct 'him' to open it manunally read-only when a message pops up.

  • it will save all documents into C:\users\username\Documents , instead of the one where it was executed from, what would be better in most instances.

  • slow process, expect a 2-3 documents/second speed in most ordinary personal computers.

Upvotes: 1

Related Questions