CLO_471
CLO_471

Reputation: 125

Copy most recent file from a batch of alike files using vbscript

I am not sure if this is possible or not. I am not even sure where to begin. I have a couple thousand files where the file names are named as so:

nnnnnnnnnnnnnnnn.yyyyddmm.pdf (n = number, yyyy = year, dd = day, and mm = month).

Within these thousands of files, there are batches of alike files that have the same nnnnnnnnnnnnnnnnnn part of the filename but the .yyyyddmm is different in order to represent the date of the file. (These batches of alike files will be merged together at a later point but that is not important to this scenario).

My question is, Is there a way to compare the yyyyddmm part of the alike files and have the most recent date files get copied to a different folder? I need the file that has the most recent date of the alike files on the filename get copied to a different folder.

The reason that I am having issues with this is because I am not sure if it is possible to compare parts of the filename to see which one is in fact the file that has the most recent date. I know that there is a way that this can be done through looking at the date modified date but this will not always give me the alike file with the most recent date.

Any thoughts?? Please let me know if I could provide more information.

Upvotes: 0

Views: 878

Answers (2)

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

Trying to understand your problem/specs. Assume a loop over the files of your .pdf folder results in:

Looking at "0000000000012345.20120402.pdf"
Looking at "0000000000012345.20120502.pdf"
Looking at "0000000000012348.20121702.pdf"
Looking at "0000000000012346.20120802.pdf"
Looking at "0000000000012347.20121002.pdf"
Looking at "0000000000012348.20121602.pdf"
Looking at "0000000000012347.20121302.pdf"
Looking at "0000000000012347.20121202.pdf"
Looking at "0000000000012345.20120202.pdf"
Looking at "0000000000012348.20121502.pdf"
Looking at "0000000000012346.20120602.pdf"
Looking at "0000000000012346.20120902.pdf"
Looking at "0000000000012348.20121402.pdf"
Looking at "0000000000012346.20120702.pdf"
Looking at "0000000000012347.20121102.pdf"
Looking at "0000000000012345.20120302.pdf"

Would

Last file for 0000000000012345 is 0000000000012345.20120502.pdf
Last file for 0000000000012348 is 0000000000012348.20121702.pdf
Last file for 0000000000012346 is 0000000000012346.20120902.pdf
Last file for 0000000000012347 is 0000000000012347.20121302.pdf

identify the files to copy correctly? If yes, say so and I will post the code here.

First, you need a class to obtain and store the info put into the file names:

' cut & store info about file(names) like "0000000000012347.20121202.pdf"
Class cCut
  Private m_sN  ' complete file name
  Private m_sG  ' group/number prefix part
  Private m_dtF ' date part; converted to ease comparisons
  Public Function cut(reCut, sFiNa)
    Set cut = Me ' return self/this from function
    Dim oMTS : Set oMTS = reCut.Execute(sFiNa)
    If 1 = oMTS.Count Then
       m_sN  = sFiNa
       Dim oSM : Set oSM = oMTS(0).SubMatches
       m_sG  = oSM(0)
       m_dtF = DateSerial(oSM(1), oSM(3), oSM(2))
    Else
       ' Err.Raise
    End If
  End Function ' cut
  Public Property Get G() : G = m_sG  : End Property ' G
  Public Property Get D() : D = m_dtF : End Property ' D
  Public Property Get N() : N = m_sN  : End Property ' N
End Class ' cCut

Then just loop over the .Files and check the date parts for each group stored in a dictionary (number prefix part used as key):

  ' The one and only .pdf folder - no recursion into subfolders!
  Dim sTDir : sTDir     = "..\data\test"
  ' dictionary to store the last/most recently used file for each group
  Dim dicG  : Set dicG  = CreateObject("Scripting.Dictionary")
  ' RegExp to cut/parse file names like "0000000000012345.20120402.pdf"
  Dim reCut : Set reCut = New RegExp
  reCut.Pattern = "^(\d{16})\.(\d{4})(\d{2})(\d{2})\.pdf$"
  Dim oFile
  For Each oFile In goFS.GetFolder(sTDir).Files
      WScript.Echo "Looking at", qq(oFile.Name)
      ' an oCut object for each file name
      Dim oCut : Set oCut = New cCut.cut(reCut, oFile.Name)
      If Not dicG.Exists(oCut.G) Then
         ' new group, first file, assume this is the latest
         Set dicG(oCut.G) = oCut
      Else
         ' found a better one for this group?
         If dicG(oCut.G).D < oCut.D Then Set dicG(oCut.G) = oCut
      End If
  Next
  WScript.Echo "-----------------------"
  Dim sG
  For Each sG In dicG.Keys
      WScript.Echo "Last file for", sG, "is", dicG(sG).N
  Next

WRT comments:

All my (ad hoc/proof of concept) scripts start with

Option Explicit
Dim goFS     : Set goFS = CreateObject( "Scripting.FileSystemObject" )

and contain some functions dealing with different aspects/stragegies for a solution to a common problem/topic. When I post code here, I copy/paste working/tested code out of the middle of a function frame like

' ============================================================================
goXPLLib.Add _
  "useDic02", "use a dictionary (Mark II)"
' ----------------------------------------------------------------------------
' ============================================================================
Function useDic02()
  useDic02 = 1 ' assume error

  ' The one and only .pdf folder - no recursion into subfolders!
  ...
  Next

  useDic02 = 0 ' success
End Function ' useDic02

(yes, there is a first attempt function "useDic()" that was guilty of storing all the oCuts for each group to be processed in a second loop; yes, there is a function "createTestData()" I needed to set up/fill my TDir). Sometimes I'm sloppy and forget about goFS, please accept my apologies.

The variable names are part of an experiment. I used to advocate type-prefixed long variable names upto and including

Dim nIdx
For nIdx = 0 To UBound(aNames)
    aNames(nIdx) = ...
Next

Other people argued that nIdx-alikes variables just add some letters to mistype but no additional meaning over i, and that aNames-alikes can't be understood without the context and if you have that, aN would be a just as good remainder for "The first names of the kings of persia from the currently processed file to be compared to the names in the database".

So I thought: Given that there are 3 interesting aspects of a file name (full name to copy, number prefix to group, date part to compare/decide) and that there is half a screen between

  Private m_sN  ' complete file name

and

  Public Property Get N() : N = m_sN  : End Property ' N

and given that you need just those 3 properties of the Cut object to use it in

  Dim oCut : Set oCut = New cCut.cut(reCut, oFile.Name)
  If Not dicG.Exists(oCut.G) Then
     ' new group, first file, assume this is the latest
     Set dicG(oCut.G) = oCut
  Else
     ' found a better one for this group?
     If dicG(oCut.G).D < oCut.D Then Set dicG(oCut.G) = oCut

will the average short time memory cope with oCut.D?

Obviously not.

To copy the selected files:

Assuming you want the files copied to an existing folder "..\data\latest", use

goFS.CopyFile goFS.BuildPath(sTDir, dicG(sG).N), "..\data\latest\", True

instead of/in addition to the line

WScript.Echo "Last file for", sG, "is", dicG(sG).N

I did not anticipate that .CopyFile chokes on relative source pathes; so consider replacing the *N*ame property of the cCut class with a *P*ath property.

Trying to use

dicG(sG).Copy "..\data\latest\", True

results in:

Microsoft VBScript runtime error: Object doesn't support this property or method: 'dicG(...).Copy'

because the objects stored aren't files (which have a .Copy method), but cCuts (which don't).

Upvotes: 2

AutomatedChaos
AutomatedChaos

Reputation: 7500

How I would handle it:

  1. I would make a dictionary with for each unique number part a separate key. The value will be an array with all file names sharing that key (and thus sharing the unique number part)

  2. For each key in the dictionary, I will loop through the items in the array, searching for the most recent date.

Approach:

  1. Get a file
  2. Extract number part
  3. See if a key for that number part exist. If not create a key for that number with an empty array as value
  4. Add the filename as a new item to the array
  5. Loop to 1. until all files are handled

  6. Get a key

  7. Get the first file in the attached array. Remember the date and the arrayindex
  8. Get the next file, if the date is higher than the remembered date, update the date to this date and the arrayindex to this array index
  9. Loop to 8. until the end of the array is reached
  10. Store the file with the arrayindex as the most recent file for that unique number
  11. loop to 6. until all keys are handled

Upvotes: 2

Related Questions