Reputation: 125
I am not sure if this is possible or not. I am not even sure where to begin. I have a couple thousand files where the file names are named as so:
nnnnnnnnnnnnnnnn.yyyyddmm.pdf (n = number, yyyy = year, dd = day, and mm = month).
Within these thousands of files, there are batches of alike files that have the same nnnnnnnnnnnnnnnnnn
part of the filename but the .yyyyddmm
is different in order to represent the date of the file. (These batches of alike files will be merged together at a later point but that is not important to this scenario).
My question is, Is there a way to compare the yyyyddmm
part of the alike files and have the most recent date files get copied to a different folder? I need the file that has the most recent date of the alike files on the filename get copied to a different folder.
The reason that I am having issues with this is because I am not sure if it is possible to compare parts of the filename to see which one is in fact the file that has the most recent date. I know that there is a way that this can be done through looking at the date modified date but this will not always give me the alike file with the most recent date.
Any thoughts?? Please let me know if I could provide more information.
Upvotes: 0
Views: 878
Reputation: 38745
Trying to understand your problem/specs. Assume a loop over the files of your .pdf folder results in:
Looking at "0000000000012345.20120402.pdf"
Looking at "0000000000012345.20120502.pdf"
Looking at "0000000000012348.20121702.pdf"
Looking at "0000000000012346.20120802.pdf"
Looking at "0000000000012347.20121002.pdf"
Looking at "0000000000012348.20121602.pdf"
Looking at "0000000000012347.20121302.pdf"
Looking at "0000000000012347.20121202.pdf"
Looking at "0000000000012345.20120202.pdf"
Looking at "0000000000012348.20121502.pdf"
Looking at "0000000000012346.20120602.pdf"
Looking at "0000000000012346.20120902.pdf"
Looking at "0000000000012348.20121402.pdf"
Looking at "0000000000012346.20120702.pdf"
Looking at "0000000000012347.20121102.pdf"
Looking at "0000000000012345.20120302.pdf"
Would
Last file for 0000000000012345 is 0000000000012345.20120502.pdf
Last file for 0000000000012348 is 0000000000012348.20121702.pdf
Last file for 0000000000012346 is 0000000000012346.20120902.pdf
Last file for 0000000000012347 is 0000000000012347.20121302.pdf
identify the files to copy correctly? If yes, say so and I will post the code here.
First, you need a class to obtain and store the info put into the file names:
' cut & store info about file(names) like "0000000000012347.20121202.pdf"
Class cCut
Private m_sN ' complete file name
Private m_sG ' group/number prefix part
Private m_dtF ' date part; converted to ease comparisons
Public Function cut(reCut, sFiNa)
Set cut = Me ' return self/this from function
Dim oMTS : Set oMTS = reCut.Execute(sFiNa)
If 1 = oMTS.Count Then
m_sN = sFiNa
Dim oSM : Set oSM = oMTS(0).SubMatches
m_sG = oSM(0)
m_dtF = DateSerial(oSM(1), oSM(3), oSM(2))
Else
' Err.Raise
End If
End Function ' cut
Public Property Get G() : G = m_sG : End Property ' G
Public Property Get D() : D = m_dtF : End Property ' D
Public Property Get N() : N = m_sN : End Property ' N
End Class ' cCut
Then just loop over the .Files and check the date parts for each group stored in a dictionary (number prefix part used as key):
' The one and only .pdf folder - no recursion into subfolders!
Dim sTDir : sTDir = "..\data\test"
' dictionary to store the last/most recently used file for each group
Dim dicG : Set dicG = CreateObject("Scripting.Dictionary")
' RegExp to cut/parse file names like "0000000000012345.20120402.pdf"
Dim reCut : Set reCut = New RegExp
reCut.Pattern = "^(\d{16})\.(\d{4})(\d{2})(\d{2})\.pdf$"
Dim oFile
For Each oFile In goFS.GetFolder(sTDir).Files
WScript.Echo "Looking at", qq(oFile.Name)
' an oCut object for each file name
Dim oCut : Set oCut = New cCut.cut(reCut, oFile.Name)
If Not dicG.Exists(oCut.G) Then
' new group, first file, assume this is the latest
Set dicG(oCut.G) = oCut
Else
' found a better one for this group?
If dicG(oCut.G).D < oCut.D Then Set dicG(oCut.G) = oCut
End If
Next
WScript.Echo "-----------------------"
Dim sG
For Each sG In dicG.Keys
WScript.Echo "Last file for", sG, "is", dicG(sG).N
Next
WRT comments:
All my (ad hoc/proof of concept) scripts start with
Option Explicit
Dim goFS : Set goFS = CreateObject( "Scripting.FileSystemObject" )
and contain some functions dealing with different aspects/stragegies for a solution to a common problem/topic. When I post code here, I copy/paste working/tested code out of the middle of a function frame like
' ============================================================================
goXPLLib.Add _
"useDic02", "use a dictionary (Mark II)"
' ----------------------------------------------------------------------------
' ============================================================================
Function useDic02()
useDic02 = 1 ' assume error
' The one and only .pdf folder - no recursion into subfolders!
...
Next
useDic02 = 0 ' success
End Function ' useDic02
(yes, there is a first attempt function "useDic()" that was guilty of storing all the oCuts for each group to be processed in a second loop; yes, there is a function "createTestData()" I needed to set up/fill my TDir). Sometimes I'm sloppy and forget about goFS, please accept my apologies.
The variable names are part of an experiment. I used to advocate type-prefixed long variable names upto and including
Dim nIdx
For nIdx = 0 To UBound(aNames)
aNames(nIdx) = ...
Next
Other people argued that nIdx-alikes variables just add some letters to mistype but no additional meaning over i, and that aNames-alikes can't be understood without the context and if you have that, aN would be a just as good remainder for "The first names of the kings of persia from the currently processed file to be compared to the names in the database".
So I thought: Given that there are 3 interesting aspects of a file name (full name to copy, number prefix to group, date part to compare/decide) and that there is half a screen between
Private m_sN ' complete file name
and
Public Property Get N() : N = m_sN : End Property ' N
and given that you need just those 3 properties of the Cut object to use it in
Dim oCut : Set oCut = New cCut.cut(reCut, oFile.Name)
If Not dicG.Exists(oCut.G) Then
' new group, first file, assume this is the latest
Set dicG(oCut.G) = oCut
Else
' found a better one for this group?
If dicG(oCut.G).D < oCut.D Then Set dicG(oCut.G) = oCut
will the average short time memory cope with oCut.D?
Obviously not.
To copy the selected files:
Assuming you want the files copied to an existing folder "..\data\latest", use
goFS.CopyFile goFS.BuildPath(sTDir, dicG(sG).N), "..\data\latest\", True
instead of/in addition to the line
WScript.Echo "Last file for", sG, "is", dicG(sG).N
I did not anticipate that .CopyFile chokes on relative source pathes; so consider replacing the *N*ame property of the cCut class with a *P*ath property.
Trying to use
dicG(sG).Copy "..\data\latest\", True
results in:
Microsoft VBScript runtime error: Object doesn't support this property or method: 'dicG(...).Copy'
because the objects stored aren't files (which have a .Copy method), but cCuts (which don't).
Upvotes: 2
Reputation: 7500
How I would handle it:
I would make a dictionary with for each unique number part a separate key. The value will be an array with all file names sharing that key (and thus sharing the unique number part)
For each key in the dictionary, I will loop through the items in the array, searching for the most recent date.
Approach:
Loop to 1. until all files are handled
Get a key
Upvotes: 2