Reputation: 319
I have a folder with thousands of files, and a spreadsheet that has 2 pieces of information:
DocumentNumber Revision
00-STD-GE-1234-56 3
I need to find and concatenate all files in the folder than match this document number and revision combination into this format:
00-STD-GE-1234-56_3.docx|00-STD-GE-1234-56_3.pdf
The pdf must be last sometimes the file is named without the last 3 chars of the document number (if they are -00 they are left off) sometimes the revision is separated using "_" and sometimes using "_r"
I have the code working, but it takes a long time (this sheet has over 7000 rows, and this code is 20 file comparisons per row against a network file system), is there an optimization for this?
''=============================================================================
Enum IsFileOpenStatus
ExistsAndClosedOrReadOnly = 0
ExistsAndOpenSoBlocked = 1
NotExists = 2
End Enum
''=============================================================================
Function IsFileReadOnlyOpen(FileName As String) As IsFileOpenStatus
'ExistsAndClosedOrReadOnly = 0
'ExistsAndOpenSoBlocked = 1
'NotExists = 2
With New FileSystemObject
If Not .FileExists(FileName) Then
IsFileReadOnlyOpen = 2 ' NotExists = 2
Exit Function 'Or not - I don't know if you want to create the file or exit in that case.
End If
End With
Dim iFilenum As Long
Dim iErr As Long
On Error Resume Next
iFilenum = FreeFile()
Open FileName For Input Lock Read As #iFilenum
Close iFilenum
iErr = Err
On Error GoTo 0
Select Case iErr
Case 0: IsFileReadOnlyOpen = 0 'ExistsAndClosedOrReadOnly = 0
Case 70: IsFileReadOnlyOpen = 1 'ExistsAndOpenSoBlocked = 1
Case Else: IsFileReadOnlyOpen = 1 'Error iErr
End Select
End Function 'IsFileReadOnlyOpen
''=============================================================================
Function BuildAndCheckPath(sMasterPath As String, sLegacyDocNum As String, sRevision As String) As String
Dim sLegacyDocNumNoSheet As String
sLegacyDocNumNoSheet = Left(sLegacyDocNum, Len(sLegacyDocNum) - 3)
Dim sFileExtensions
sFileExtensions = Array(".doc", ".docx", ".xls", ".xlsx", ".pdf")
Dim sRevisionSpacer
sRevisionSpacer = Array("_", "_r")
Dim i As Long
Dim j As Long
Dim sResult As String
'for each revision spacer option
For i = LBound(sRevisionSpacer) To UBound(sRevisionSpacer)
'for each file extension
For j = LBound(sFileExtensions) To UBound(sFileExtensions)
'Check if the file exists (assume a sheet number i.e. 00-STD-GE-1234-56)
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
If sResult = "" Then
sResult = sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
Else
sResult = sResult & "|" & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
End If
End If
'Do it again without a sheet number in the filename (last 3 digits stripped off legacy number)
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
If sResult = "" Then
sResult = sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
Else
sResult = sResult & "|" & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)
End If
End If
Next j
Next i
BuildAndCheckPath = sResult
End Function
Upvotes: 0
Views: 1335
Reputation: 3279
It's hard to tell without seeing your dataset, but perhaps this approach could be implemented (note the use of Wildcards
):
UNTESTED
Const Folder As String = "C:\YourFolder\"
Dim File as Object
Dim XLSFile As String
Dim PDFFile As String
Dim ConCat() As String
Dim DocNos() As Variant
Dim DocRev() As Variant
Dim i As Long
DocNos = Range("A1:A10") '<--Your list of Document #s.
DocRev = Range("B1:B10") '<--Your list of Revision #s.
ReDim ConCat(1 To UBound(DocNos))
'Loop through your Document numbers.
For i = LBound(DocNos) To UBound(DocNos)
'Loop through the folder.
File = Dir(Folder)
Do While File <> ""
'Check the filename against the Document number. Use a wildcard at this _
'point as a sort of "gatekeeper"
If File Like Left(DocNos(i), Len(DocNos(i)) - 3) & "*"
'If the code makes it to this point, you just need to match file _
'type and revision.
If File Like "*_*" & DocRev(i) And File Like "*.xls*" Then
XLSFile = File
ElseIf File Like "*_*" & DocRev(i) File Like "*.pdf" Then
PDFFile = File
End If
If XLSFile <> "" And PDFFile <> "" Then
ConCat(i) = XLSFile & "|" & PDFFile
XLSFile = vbNullString
PDFFile = vbNullString
End If
End If
File = Dir
Loop
Next i
To print the results to your sheet (Transpose
pastes the results of the array in one column instead of putting the results in one row), you could use something like this:
Dim Rng As Range
Set Rng = Range("C1")
Rng.Resize(UBound(ConCat),1).Value = Application.Transpose(ConCat)
This approach loops through each document number from your spreadsheet, and then checks each file in the folder to see if it matches the document number, document type, and revision number. Once it finds a match for both .xls* and .pdf types, it concatenates the filenames together.
See this great SO post regarding looping through files.
See this site for more info about the Dir
function.
See this article regarding wilcard character usage when comparing strings.
Hope that helps!
Upvotes: 1
Reputation: 38520
Seems to me you are doing unnecessary file existence checks even in cases where a file has already been found. Assuming that talking with your network drive is indeed what takes up most of your execution time, then there's a place to optimise.
What you're doing is this:
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
'Great. Found it.
'...
End If
'Do it again without a sheet number in the filename (last 3 digits stripped off legacy number)
'Wait a minute... why ask me to look again if I already found it?
'He must not mind the extra waiting time... ok, here we go again.
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
'...
End If
I think you want to look for your file under a different filename if and only if you haven't found it under the first filename pattern. Can do this using an Else
clause:
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
'Great. Found it.
Else
'Didn't find it using the first filename format.
'Do it again without a sheet number in the filename (last 3 digits stripped off legacy number)
If IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
'Great. Found it.
Else
Err.Raise 53, , _
"File not found even though I looked for it in two places!"
End If
End If
This can theoretically cut your number of tries by up to half; likely less in practice, but you'll get the largest benefit if you check the most common filename pattern first. The benefit will be proportionally larger if you have a greater number of filename patterns; from your question I understand you have 4 different combinations?
If you have more than 2 patterns to check, then nesting a bunch of Else
clauses will look silly and be difficult to read; instead you can do something like this:
Dim foundIt As Boolean
foundIt = False
If Not foundIt And IsFileReadOnlyOpen(sMasterPath & sLegacyDocNum & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
'Great. Found it.
foundIt = True
End If
If Not foundIt And IsFileReadOnlyOpen(sMasterPath & sLegacyDocNumNoSheet & sRevisionSpacer(i) & sRevision & sFileExtensions(j)) <> 2 Then
'Great. Found it.
foundIt = True
End If
'...
'... check your other patterns here...
'...
If Not foundIt Then
Err.Raise 53, , _
"File not found even though I looked for it various places!"
End If
Upvotes: 0