trueimage
trueimage

Reputation: 319

Is there a better way to check if files exist using Excel VBA

I have a folder with thousands of files, and a spreadsheet that has 2 pieces of information:

DocumentNumber       Revision
00-STD-GE-1234-56       3

I need to find and concatenate all files in the folder than match this document number and revision combination into this format:

00-STD-GE-1234-56_3.docx|00-STD-GE-1234-56_3.pdf

The pdf must be last sometimes the file is named without the last 3 chars of the document number (if they are -00 they are left off) sometimes the revision is separated using "_" and sometimes using "_r"

I have the code working, but it takes a long time (this sheet has over 7000 rows, and this code is 20 file comparisons per row against a network file system), is there an optimization for this?

''=============================================================================
 Enum IsFileOpenStatus
        ExistsAndClosedOrReadOnly = 0
        ExistsAndOpenSoBlocked = 1
        NotExists = 2
End Enum
''=============================================================================

Function IsFileReadOnlyOpen(FileName As String) As IsFileOpenStatus

'ExistsAndClosedOrReadOnly = 0
'ExistsAndOpenSoBlocked = 1
'NotExists = 2

With New FileSystemObject
        If Not .FileExists(FileName) Then
                    IsFileReadOnlyOpen = 2  '  NotExists = 2
                    Exit Function 'Or not - I don't know if you want to create the file or exit in that case.
        End If
End With

Dim iFilenum As Long
Dim iErr As Long
        On Error Resume Next
                    iFilenum = FreeFile()
                    Open FileName For Input Lock Read As #iFilenum
                    Close iFilenum
                    iErr = Err
        On Error GoTo 0

Select Case iErr
    Case 0: IsFileReadOnlyOpen = 0 'ExistsAndClosedOrReadOnly = 0
    Case 70: IsFileReadOnlyOpen = 1 'ExistsAndOpenSoBlocked = 1
    Case Else: IsFileReadOnlyOpen = 1 'Error iErr
End Select

End Function    'IsFileReadOnlyOpen
''=============================================================================

Function BuildAndCheckPath(sMasterPath As String, sLegacyDocNum As String, sRevision As String) As String
Dim sLegacyDocNumNoSheet As String
sLegacyDocNumNoSheet = Left(sLegacyDocNum, Len(sLegacyDocNum) - 3)
Dim sFileExtensions
sFileExtensions = Array(".doc", ".docx", ".xls", ".xlsx", ".pdf")
Dim sRevisionSpacer
sRevisionSpacer = Array("_", "_r")
Dim i As Long
Dim j As Long
Dim sResult As String

'for each revision spacer option
For i = LBound(sRevisionSpacer) To UBound(sRevisionSpacer)
'for each file extension
For j = LBound(sFileExtensions) To UBound(sFileExtensions)
    'Check if the file exists (assume a sheet number i.e. 00-STD-GE-1234-56)
    If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
        If sResult = "" Then
            sResult = sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
        Else
            sResult = sResult & "|" & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
        End If
    End If
    'Do it again without a sheet number in the filename (last 3 digits stripped off legacy number)
    If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
        If sResult = "" Then
            sResult = sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
        Else
            sResult = sResult & "|" & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
        End If
    End If
Next j
Next i

BuildAndCheckPath = sResult

End Function

Upvotes: 0

Views: 1335

Answers (2)

ARich
ARich

Reputation: 3279

It's hard to tell without seeing your dataset, but perhaps this approach could be implemented (note the use of Wildcards):

UNTESTED

Const Folder As String = "C:\YourFolder\"  
Dim File as Object
Dim XLSFile As String
Dim PDFFile As String
Dim ConCat() As String
Dim DocNos() As Variant
Dim DocRev() As Variant
Dim i As Long

DocNos = Range("A1:A10") '<--Your list of Document #s.
DocRev = Range("B1:B10") '<--Your list of Revision #s.
ReDim ConCat(1 To UBound(DocNos))

'Loop through your Document numbers.
For i = LBound(DocNos) To UBound(DocNos)
    'Loop through the folder.
    File = Dir(Folder)
    Do While File <> ""
        'Check the filename against the Document number. Use a wildcard at this _
        'point as a sort of "gatekeeper"
        If File Like Left(DocNos(i), Len(DocNos(i)) - 3) & "*"
            'If the code makes it to this point, you just need to match file _
            'type and revision.
            If File Like "*_*" & DocRev(i) And File Like "*.xls*" Then
                XLSFile = File
            ElseIf File Like "*_*" & DocRev(i) File Like "*.pdf" Then
                PDFFile = File
            End If
            If XLSFile <> "" And PDFFile <> "" Then 
                ConCat(i) = XLSFile & "|" & PDFFile
                XLSFile = vbNullString
                PDFFile = vbNullString
            End If
        End If
        File = Dir
    Loop
Next i

To print the results to your sheet (Transpose pastes the results of the array in one column instead of putting the results in one row), you could use something like this:

Dim Rng As Range

Set Rng = Range("C1")
Rng.Resize(UBound(ConCat),1).Value = Application.Transpose(ConCat)

This approach loops through each document number from your spreadsheet, and then checks each file in the folder to see if it matches the document number, document type, and revision number. Once it finds a match for both .xls* and .pdf types, it concatenates the filenames together.

See this great SO post regarding looping through files.
See this site for more info about the Dir function.
See this article regarding wilcard character usage when comparing strings.

Hope that helps!

Upvotes: 1

Seems to me you are doing unnecessary file existence checks even in cases where a file has already been found. Assuming that talking with your network drive is indeed what takes up most of your execution time, then there's a place to optimise.

What you're doing is this:

If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
    'Great. Found it.
    '...
End If
'Do it again without a sheet number in the filename (last 3 digits stripped off legacy number)
'Wait a minute... why ask me to look again if I already found it?
'He must not mind the extra waiting time... ok, here we go again.
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
    '...
End If

I think you want to look for your file under a different filename if and only if you haven't found it under the first filename pattern. Can do this using an Else clause:

If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
    'Great. Found it.
Else
    'Didn't find it using the first filename format.
    'Do it again without a sheet number in the filename (last 3 digits stripped off legacy number)
    If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
        'Great. Found it.
    Else
        Err.Raise 53, , _
            "File not found even though I looked for it in two places!"
    End If
End If

This can theoretically cut your number of tries by up to half; likely less in practice, but you'll get the largest benefit if you check the most common filename pattern first. The benefit will be proportionally larger if you have a greater number of filename patterns; from your question I understand you have 4 different combinations?

If you have more than 2 patterns to check, then nesting a bunch of Else clauses will look silly and be difficult to read; instead you can do something like this:

Dim foundIt As Boolean
foundIt = False
If Not foundIt And IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
    'Great. Found it.
    foundIt = True
End If
If Not foundIt And IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
        'Great. Found it.
        foundIt = True
End If
'...
'... check your other patterns here...
'...
If Not foundIt Then
    Err.Raise 53, , _
        "File not found even though I looked for it various places!"
End If

Upvotes: 0

Related Questions