Norman Ramsey
Norman Ramsey

Reputation: 202505

How can I predict the size of an ISO 9660 filesystem?

I'm archiving data to DVD, and I want to pack the DVDs full. I know the names and sizes of all the files I want on the DVD, but I don't know how much space is taken up by metadata. I want to get as many files as possible onto each DVD, so I'm using a Bubblesearch heuristic with greedy bin-packing. I try 10,000 alternatives and get the best one. Currently I know the sizes of all the files and because I don't know how files are stored in an ISO 9660 filesystem, I add a lot of slop for metadata. I'd like to cut down the slop.

I could use genisoimage -print-size except it is too slow---given 40,000 files occupying 500MB, it takes about 3 seconds. Taking 8 hours per DVD is not in the cards. I've modified the genisoimage source before and am really not keen to try to squeeze the algorithm out of the source code; I am hoping someone knows a better way to get an estimate or can point me to a helpful specification.


Clarifying the problem and the question:


Note: these files are on a machine with 3TB of hard-drive storage. In all cases the average size of the files is at least 10MB; sometimes it is significantly larger. So it is possible that genisomage will be fast enough after all, but I doubt it---it appears to work by writing the ISO image to /dev/null, and I can't imagine that will be fast enough when the image size approaches 4.7GB. I don't have access to that machine right now, or when I posted the original question. When I do have access in the evening I will try to get better numbers for the question. But I don't think genisomage is going to be a good solution---although it might be a good way to learn a model of the filesystem that tells me how how it works. Knowing that block size is 2KB is already helpful.

It may also be useful to know that files in the same directory are burned to the samae DVD, which simplifies the search. I want access to the files directly, which rules out tar-before-burning. (Most files are audio or video, which means there's no point in trying to hit them with gzip.)

Upvotes: 7

Views: 3905

Answers (6)

Jean-Marc
Jean-Marc

Reputation: 109

I have to save SQL backups onto DVDs ranging from 200KB to 3GB. I only have about 30 files to do every week. So, using MS-Access I created this module.

I calculate a combination of files and their sizes then pick the best combo by filling the DVD sequentially or up and down.

     Option Compare Database
     Option Explicit
     
     ' arranging SQL back files to fit on as fewest DVD as possible is a bin packing problem
     ' This is a bin packing problem - no one best solution - a NP Hard problem!
     '
     'DVD-R single layer
     '2,298,496 sectors, 4,707,319,808 longs
     'subtract 10MB overhead = 4,696,834,048 longs
     Public Const DVD_CAPACITY As Double = 4500000000# '' rounding down to a conservative number of bytes per DVD disc (4.37GB ~ 4,692,251,771)
     
     '' make list of files per disc
     Function makeList(wk As Long)
     Const KB As Double = 1024# '' a KB in longs
     Dim rs As DAO.Recordset, txt As String '' retrieve the data from the Access table
     Dim tfs As Long, filSz As Double, totsz As Double ''total files, tmp file size, total size of files
     Dim bLb As Long '' bins lower bound
     Dim discAvg As Double, dvdMax As Double '' setting a max size per disc
     Dim itr As Long, seq As Long, best As Long  '' iteration counter of the main loop, bin sequencer, best iteration
     Dim inc As Long, xtr As Long '' incremental per loop ,extra files not packed in bin (remainder)
     Dim bPtr As Long, fPtr As Long  '' bin/disc pointer, file pointer
     Dim i As Long, j As Long, col As Long, tm As Single  '' general purpose counter,  array pointer file, time keeper
     Dim smpl(10) As Long, updn(10) As Boolean  '' setup for each iteration
     Dim files As Variant '' list of files from recordset
     Dim fAlloc() As Long '' into which bin is each file allocated
     Dim maxS(10) As Double, remd(10) As Long '' track bin size and file remaining after each iteration
     Dim discPtr() As Long '' array pointer to the next empty discFile() slot
     Dim discSize() As Double '' total file size allocated in each disc, size loaded
     Dim discFile() As Long '' the bins to pack, array pointers to files() for each file in each disc
     Dim discSpace As Double
     Dim tCbo As Long, comboMax(9) As Long, comboUsed() As Boolean, combos() As Integer, comboSize() As Double   '' calculated combinations - store fileID and combo size, total combos
     tm = Timer
     smpl(0) = 7: smpl(1) = 7: smpl(2) = 6: smpl(3) = 6: smpl(4) = 5: smpl(5) = 5: smpl(6) = 4: smpl(7) = 4: smpl(8) = 3: smpl(9) = 3
     updn(0) = 0: updn(1) = -1: updn(2) = 0: updn(3) = -1: updn(4) = 0: updn(5) = -1: updn(6) = 0: updn(7) = -1: updn(8) = 0: updn(9) = -1
     ''////// GET FILE LIST FOR THE WEEK WE ARE BACKING UP
     '' get a list of files sorted by size descending
     txt = "SELECT [fileid],[filesz] from [files_per_week] where ([weekid]=" & wk & ") order by [filesz] desc;"
     '',[filename],[filedt] - not required.
     Set rs = CurrentDb.OpenRecordset(txt, dbOpenDynaset)
     files = rs.GetRows(200)
     '' files( column, row )
     '' columns: 0[fileid] long, 1[filesz] double - is all we need
     rs.Close
     tfs = UBound(files, 2) '' tfs + 1 = total files
     ReDim fAlloc(tfs) '' file to bin allocation
     For i = 0 To tfs '' for each file
         filSz = files(1, i) / 2048# '' how many sectors
         '' adjust file size for DVD (blocks of 2048 + overhead) and add them up
         If (filSz - Int(filSz)) > 0 Then
         filSz = Int(filSz) + 2 '' add partial + VAT descriptor
         Else
         filSz = Int(filSz) + 1
         End If
         files(1, i) = filSz * 2048#
         totsz = totsz + files(1, i)
     Next
     bLb = Int(totsz / DVD_CAPACITY) + 1 '' minimum/optimal number of bins/disc
     discAvg = Int(totsz / bLb) + 4 '' optimum avg size per disc (nearest word)
     dvdMax = discAvg + (KB * KB) '' start with 1MB of slack   files(1, tfs) '' add the smallest file to avg; appears to create a better distribution
     If dvdMax > DVD_CAPACITY Then
          dvdMax = DVD_CAPACITY '' dont go beyond that limit
     End If
     Debug.Print "Total file sizes " & Format(totsz / KB, "#,###") & " KB for week " & wk & " Optimum num of Bins " & bLb
     Debug.Print " DVD Size = " & Format(DVD_CAPACITY / KB, "#,###") & " KB. "
     Debug.Print " Avg / disc " & Format(discAvg / KB, "#,###") & " KB. "
     Debug.Print " Set at max " & Format(dvdMax / KB, "#,###") & " KB."
     ''//////// START PACKING BINS
     '' set number of bin to blb - can redim last dimension to higher # of bin
     bLb = bLb - 1 '' shift for array pointing
     ReDim discFile(tfs, bLb) '' discfile( spot, bin) to ID of item - discfile for fileID for each file on disc
     ReDim discPtr(bLb) '' discPtr( bin) to next empty spot - pointer to the next available space on the packing list
     ReDim discSize(bLb) '' discsize (bin) load of bin - total filesize loaded on disc
     '' TRY 10 DIFFERENT COMBOS AND FILLING SCENARIOS
     best = -1
     For itr = 0 To 10
         If itr = 10 Then
         Debug.Print "Final Iteration."
         End If
         '' RESET DISC MANIFEST AND FILE ALLOCATION ARRAYS
         For j = 0 To bLb
             For i = 0 To tfs
                 discFile(i, j) = -1
                 fAlloc(i) = -1 '' gets repeated - not a huge overhead
             Next
             discPtr(j) = 0
             discSize(j) = 0
         Next
         '' PLACE A FILE IN EACH BIN STARTING WITH LARGEST ITEM, ordered largest to smallest.
         fPtr = 0 '' file count/pointer
         For bPtr = 0 To bLb   '' for each bin
         ''    Do While discSize(bPtr) < (dvdMax / 2) '' until filled at 50% or more - not that efficient use of space
                 discFile(discPtr(bPtr), bPtr) = fPtr 
                 discPtr(bPtr) = discPtr(bPtr) + 1 '' point to next avail slot
                 discSize(bPtr) = discSize(bPtr) + files(1, fPtr)
                 fAlloc(fPtr) = i
                 fPtr = fPtr + 1
         ''    Loop
         Next
         '' CREATE A LIST OF FILE COMBINATIONS (UP TO 7) TO MAKE A BEST FIT
         If tCbo = 0 Then
             '' do once - number of files will not change
             tCbo = createComboList(fPtr, tfs, 7, combos, comboSize, files, comboMax) '' (tot files lbound, ubound, max sample, 4 arrays)
             ReDim comboUsed(tCbo)
     ''        '' check max sizes
     ''        For i = 2 To 9
     ''            j = comboMax(i - 1) + 1
     ''            ''If j < fPtr Then j = fPtr
     ''            If j >= fPtr And j <= tCbo Then
     ''            Debug.Print i, Format(comboSize(j), "#,###")
     ''            End If
     ''        Next
         Else
             '' reset the comboUsed
             For i = 0 To comboMax(smpl(itr))
                 comboUsed(i) = False
             Next
         End If
         '' CONTINUE FILLING THE BINS
         col = UBound(combos) '' find the combos number of columns (sample size)
         inc = 0 '' files added per bin sweep.
         If updn(itr) = True Then '' we go up and down the bins: 0,1,2,2,1,0,0,1,2...
             bPtr = 0 '' start at the last bin
             seq = -1 '' go down
         Else
             bPtr = 0 '' start at the first bin
             seq = 1 '' initial bin step - either 1 or -1
         End If
         Do ''repeat for each bin - bin step determined below.
             discSpace = dvdMax - discSize(bPtr) '' remaining space
             filSz = files(1, tfs) - 2048#: fPtr = -1 '' temp size var (no combo is smaller than smallest file); f = combo found
             '' FIND A SET OF FILES TO PACK NEXT
             For i = LBound(comboSize) To comboMax(smpl(itr))  '' check all combos
                 If comboUsed(i) = False Then
                     '' theres room for this combo
                     If comboSize(i) < discSpace And comboSize(i) > filSz Then '' is this the largest so far
                         '' is any file id in that combo allready allocated
                         For j = 0 To col
                             If combos(j, i) > -1 Then
                                 '' we have a file pointer
                                 If fAlloc(combos(j, i)) > -1 Then
                                 '' that file was loaded in a bin
                                     comboUsed(i) = True '' remove combofrom future search
                                     Exit For
                                 End If
                             End If
                         Next
                         If comboUsed(i) = False Then '' if all files of this combo are available
                             filSz = comboSize(i) '' save that size
                             fPtr = i '' save the ptr to the combo
                         End If '' else continue search
                     End If
                 End If
             Next
             '' PUT FILES IN BIN
             '' combos(x,fPtr),comboSize(fPtr) will have the best combo to put in bin
             If fPtr > -1 Then '' found a file combo
                 '' put each file in the bin
                 For j = 0 To col
                     If combos(j, fPtr) > -1 Then
                         '' we have a fileID
                         discFile(discPtr(bPtr), bPtr) = combos(j, fPtr) '' load file pointer in slot
                         discSize(bPtr) = discSize(bPtr) + files(1, combos(j, fPtr)) '' add size to load
                         fAlloc(combos(j, fPtr)) = bPtr   '' file was allocated a bin
                         discPtr(bPtr) = discPtr(bPtr) + 1 '' point to next slot
                         inc = inc + 1 '' count the number of fill action
                     End If
                 Next
                 '' this combo was used
                 comboUsed(fPtr) = True 'remove from search
             End If
             '' ANY REMAINING FILES TO PACK?
             xtr = 0
             For i = 0 To tfs
                 If fAlloc(i) < 0 Then
                     xtr = xtr + 1 '' a file remains unpacked
                 End If
             Next
             '' MOVE TO NEXT BIN - which bin/disc to point to next
             bPtr = bPtr + seq '' next bin
             If bPtr > bLb Then '' if ptr past last bin
                 If updn(itr) = True Then '' we go up and down the bins: 0,1,2,2,1,0,0,1,2...
                     seq = seq * -1 '' reverse the bin order
                     bPtr = bPtr + seq
                 Else ' go thru the bins 0,1,2,0,1,2,0,1,2...
                     bPtr = 0 '' back to bin 0
                 End If
             ElseIf bPtr < 0 Then
                 seq = seq * -1 '' reverse the bin order
                 bPtr = bPtr + seq
             End If
             '' if we're back to bin 0
             If bPtr = 0 Then
                 If inc = 0 And xtr > 0 Then
                     ' if no files added this round and there is 1+ file remaining
                     Exit Do '' exit loop with xtr file remaining
                 Else
                     inc = 0 '' reset for another round
                 End If
             End If
             DoEvents
             If (Timer - tm) > 90 Then Stop '' we have a runaway loop
         Loop While xtr > 0 
         filSz = 0
         '' STORE RESULT OF THIS ITERATION
         For i = 0 To bLb
             If discSize(i) > filSz Then filSz = discSize(i)
         Next
         maxS(itr) = filSz
         remd(itr) = xtr
     ''    '' get smallest disc size
     ''    For i = 0 To bLb
     ''        If discSize(i) < filSz Then filSz = discSize(i)
     ''    Next
         '' FINAL ITERATION DONE - PREPARE FOR LAST ITERATION
         If itr = 9 Then ' last iteration completed - prep for final pass or reset
     ''        Debug.Print "   iteration results "
     ''        For i = 0 To 9
     ''            Debug.Print i & "  " & smpl(i), IIf(updn(i), "Up/Dn", "Seq"), maxS(i), remd(i)
     ''        Next
             '' FIND THE BEST COMBO - max slack space
             filSz = DVD_CAPACITY
             For i = 0 To 9
                 '' pick a successful combo that has the most space remaining on disc
                 If maxS(i) <= filSz And remd(i) = 0 Then 'with no file remaining
                     filSz = maxS(i)
                     smpl(10) = smpl(i)
                     updn(10) = updn(i)
                     best = i
                 End If
             Next
             '' if not successful add to size or bin.
             If smpl(10) = 0 Then
                 '' need to increase size available
                 dvdMax = Int(dvdMax * 1.025)  ' add 2.5% of the free space to the operating max
                 If dvdMax > DVD_CAPACITY Then 'if we go over this limit
                     bLb = bLb + 1 ' adding one bin (bLb is 0 based)
                     ReDim discFile(tfs, bLb)  '' discfile for fileID for each file on disc
                     ReDim discPtr(bLb) '' pointer to the next available space on the packing list
                     ReDim discSize(bLb) '' total filesize loaded on disc
                     Debug.Print "*** ADDING A DISC ***"
                     discAvg = Int(totsz / (bLb + 1)) + 4 '' optimum avg size per disc (nearest word)
                     dvdMax = discAvg + (2 * KB * KB) '' start with 2MB of slack
                 Else
                     Debug.Print " - Trying a higher Operating max: " & Format(dvdMax / KB, "#,###") & " KB.";
                     ''discSpace = dvdMax - discAvg  slack space available per disc
                     ''Debug.Print "  Slack space " & Format(discSpace / KB, "#,###") & " KB."
                 End If
                 itr = -1 '' reset the iteration counter
             End If
         End If
     ''    '' TROUBLESHOOTHING
     ''    If itr > -1 Then
     ''        For i = 0 To bLb
     ''            Debug.Print "Try " & itr & " c" & smpl(itr) & updn(itr) & " Disc " & i & ", " & Format(discSize(i), "#,###") & " B. " & discPtr(i) & " fileID ";
     ''            '' print list of files added this round
     ''            For j = 0 To UBound(discFile)
     ''                If discFile(j, i) < 0 Then Exit For
     ''                Debug.Print ","; discFile(j, i);
     ''            Next
     ''            Debug.Print
     ''        Next
     ''    End If
     ''   Debug.Print "Try " & itr & "  remains " & xtr & " slack " & Format(discSpace, "#,###")
     Next
     ''        Debug.Print " Iteration results"
     ''        For i = 0 To 10
     ''            Debug.Print i & "  " & smpl(i), IIf(updn(i), "Up/Dn", "Seq"), maxS(i), remd(i)
     ''        Next
             Debug.Print " Best iteration: #" & best & " - combo " & smpl(10) & "/" & tfs, IIf(updn(10), "Up/Dn", "Seq")
             txt = ""
             For i = 0 To bLb
                 Debug.Print "Disc " & i & ", " & Format(discSize(i), "#,###") & " B. " & discPtr(i) & " file sequence(ID) "
                 txt = txt & discPtr(i) & ", "
     ''            ' print list of files added to this disc
     ''            For j = 0 To UBound(discFile)
     ''                If discFile(j, i) < 0 Then Exit For
     ''                Debug.Print ","; discFile(j, i); ''files(0, discFile(h, i));
     ''            Next
     ''            Debug.Print
             Next
             CurrentDb.Execute "DELETE * FROM comboEfficiency WHERE weekID = " & wk
             CurrentDb.Execute "INSERT INTO comboEfficiency (weekID, calcDT, totFiles, bestSample, binVariation, discAvg, discMax, perDisc)" & _
                 "VALUES(" & wk & ",#" & Now() & "#," & (tfs + 1) & "," & smpl(10) & ",'" & IIf(updn(10), "Up and Down", "Sequential") & "'," & discAvg & "," & maxS(10) & ",'" & Left(txt, Len(txt) - 2) & "')"
     
     Debug.Print "Print and Save manifest for week " & wk
     txt = "SELECT * FROM manifest"
     CurrentDb.Execute "DELETE * FROM manifest WHERE weekID = " & wk
     Set rs = CurrentDb.OpenRecordset(txt, dbOpenDynaset)
     '' reuse txt var
     txt = ""
     For bPtr = 0 To bLb
         ''Debug.Print
         txt = txt & "DICS " & (bPtr + 1) & " Total " & Format(discSize(bPtr) / KB, "#,###") & " KB, with " & discPtr(bPtr) & " files." & vbCrLf
         For i = 0 To tfs
             If discFile(i, bPtr) > -1 Then
                 rs.AddNew
                 rs!weekID = wk ' week no
                 rs!DVD_no = bPtr ' which bin/disc is file going to
                 rs!fileID = files(0, discFile(i, bPtr)) ' the fileID from table [files]
                 rs.Update
             End If
         Next
     Next
     rs.Close
     MsgBox txt
     Debug.Print
     Debug.Print
     End Function
     
    
     Function createComboList(ByRef startAtFile As Long, ByRef totalFiles As Long, ByRef samples As Long, ByRef binItem, ByRef binLoad, ByRef srcItems, ByRef comboCount) As Long
     '' CREATE A LIST OF FILE COMBINATIONS TO MAKE A BEST FIT
     '' startAtFile - lower bound of scrItems array
     '' totalFiles - upper bound of scrItems array
     '' samples - max number of sample r taken from number of files (totalFiles - startAtFile)
     '' binItem - table array containing pointers to srcItems() - one row per combination
     '' binLoad - total estimated size for each combination
     '' srcItems - table array of files ( fileID, Size)
     '' comboCount - count of combos per sample
     '' general purpose counter,  array pointer file,
     Dim a As Long, b As Long, c As Long, d As Long, e As Long, f As Long, g As Long, h As Long, i As Long, j As Long, k As Long, n As Long
     Dim dm As Long '' samples 0 based
     Dim dc As Long '' sample counter 0 based
     Dim t0 As Long '' srcItems lbound
     Dim tf As Long '' tot files- ubound
     Dim cb As Long '' number of combos
     Dim tm As Single, t2 As Single '' time keepers
     Dim timelim As Integer ' how many seconds to wait
     Dim r As Double
     timelim = 2
     '' minimum requirement: 2 to 8 samples, and a minimum number of files
     If samples < 2 Or samples > 9 Or totalFiles <= samples Then Exit Function
     ''samples = 7: totalFiles = 32 ' TESTING
     tm = Timer
     dm = samples - 1
     t0 = startAtFile
     tf = totalFiles
     '' to find the best combination to fill the remaining space make a list of possibilities
     cb = tf  'add the single files (combo(tf,1))
         ''Adding  32  combos, total  33 , took 0.078125  sec.
     For dc = 2 To samples
         ''there is an undocumented hard limit for array around 512MB
         If ((cb + combo(tf - t0 + 1, dc)) * dc * 2) > 134217728# Then '' to reduce processing time lets use 128MB
         dc = dc - 1
         Exit For
         End If
         cb = cb + combo(tf - t0 + 1, dc) '' how many combinations taken dc at a time
     Next
     dm = dc - 1
         '' Total combinations 7 out of 32 = 4514872, use approx. 96,999 KB of RAM. 4514872 combos took 2.375 sec
     ReDim binItem(dm, t0 To cb), binLoad(t0 To cb)
     ''Debug.Print " Estimate total combinations "; cb; " taken "; dm; " at a time, will use approx. "; Format((cb * ((dm * 2))), "#,###"); " bytes of RAM"
     n = cb
     cb = t0
     '' single file
     For a = t0 To tf
         binItem(0, cb) = a
         For dc = 1 To dm: binItem(dc, cb) = -1: Next
         binLoad(cb) = srcItems(1, a)
         cb = cb + 1
     Next
     comboCount(1) = cb - 1
     If (Timer - tm) > 1 Then Exit Do
     '' combine 2
     If dm >= 1 Then
     For a = t0 To (tf - 1)
         For b = (a + 1) To tf
             binItem(0, cb) = a: binItem(1, cb) = b
             For dc = 2 To dm: binItem(dc, cb) = -1: Next
             binLoad(cb) = srcItems(1, a) + srcItems(1, b)
             cb = cb + 1
         Next
     Next
     comboCount(2) = cb - 1
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 3
     If dm > 2 Then
     For a = t0 To (tf - 2)
         For b = (a + 1) To (tf - 1)
             For c = (b + 1) To tf
                 binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c
                 For dc = 3 To dm: binItem(dc, cb) = -1: Next
                 binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, b)
                 cb = cb + 1
             Next
         Next
     Next
     comboCount(3) = cb - 1
     t2 = (Timer - tm)
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 4
     If dm > 3 Then
     For a = t0 To (tf - 3)
         For b = (a + 1) To (tf - 2)
             For c = (b + 1) To (tf - 1)
                 For d = (c + 1) To tf
                     binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d
                     For dc = 4 To dm: binItem(dc, cb) = -1: Next
                     binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d)
                     cb = cb + 1
                 Next
             Next
         Next
     Next
     comboCount(4) = cb - 1
     t2 = (Timer - tm)
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 5
     If dm > 4 Then
     For a = t0 To (tf - 4)
         For b = (a + 1) To (tf - 3)
             For c = (b + 1) To (tf - 2)
                 For d = (c + 1) To (tf - 1)
                     For e = (d + 1) To tf
                         binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e
                         For dc = 5 To dm: binItem(dc, cb) = -1: Next
                         binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e)
                         cb = cb + 1
                     Next
                 Next
             Next
         Next
     Next
     comboCount(5) = cb - 1
     t2 = (Timer - tm)
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 6
     If dm > 5 Then
     For a = t0 To (tf - 5)
         For b = (a + 1) To (tf - 4)
             For c = (b + 1) To (tf - 3)
                 For d = (c + 1) To (tf - 2)
                     For e = (d + 1) To (tf - 1)
                         For f = (e + 1) To tf
                             binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f
                             For dc = 6 To dm: binItem(dc, cb) = -1: Next
                             binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f)
                             cb = cb + 1
                         Next
                     Next
                 Next
             Next
         Next
     Next
     comboCount(6) = cb - 1
     t2 = (Timer - tm)
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 7
     If dm > 6 Then
     For a = t0 To (tf - 6)
         For b = (a + 1) To (tf - 5)
             For c = (b + 1) To (tf - 4)
                 For d = (c + 1) To (tf - 3)
                     For e = (d + 1) To (tf - 2)
                         For f = (e + 1) To (tf - 1)
                             For g = (f + 1) To tf
                                 binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f: binItem(6, cb) = g
                                 For dc = 7 To dm: binItem(dc, cb) = -1: Next
                                 binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f) + srcItems(1, g)
                                 cb = cb + 1
                             Next
                         Next
                     Next
                 Next
             Next
         Next
     Next
     comboCount(7) = cb - 1
     t2 = (Timer - tm)
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 8
     If dm > 7 Then
     For a = t0 To (tf - 7)
         For b = (a + 1) To (tf - 6)
             For c = (b + 1) To (tf - 5)
                 For d = (c + 1) To (tf - 4)
                     For e = (d + 1) To (tf - 3)
                         For f = (e + 1) To (tf - 2)
                             For g = (f + 1) To (tf - 1)
                                 For h = (g + 1) To tf
                                     binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f: binItem(6, cb) = g: binItem(7, cb) = h
                                     For dc = 8 To dm: binItem(dc, cb) = -1: Next
                                     binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f) + srcItems(1, g) + srcItems(1, h)
                                     cb = cb + 1
                                 Next
                             Next
                         Next
                     Next
                 Next
             Next
         Next
     Next
     comboCount(8) = cb - 1
     t2 = (Timer - tm)
     If (t2 * 3) > timelim Then Exit Do
     End If
     ' combine 9
     If dm > 8 Then
     For a = t0 To (tf - 8)
         For b = (a + 1) To (tf - 7)
             For c = (b + 1) To (tf - 6)
                 For d = (c + 1) To (tf - 5)
                     For e = (d + 1) To (tf - 4)
                         For f = (e + 1) To (tf - 3)
                             For g = (f + 1) To (tf - 2)
                                 For h = (g + 1) To (tf - 1)
                                     For i = (h + 1) To tf
                                         binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f: binItem(6, cb) = g: binItem(7, cb) = h: binItem(8, cb) = i
                                         For dc = 9 To dm: binItem(dc, cb) = -1: Next
                                         binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f) + srcItems(1, g) + srcItems(1, h) + srcItems(1, i)
                                         cb = cb + 1
                                     Next
                                 Next
                             Next
                         Next
                     Next
                 Next
             Next
         Next
     Next
     comboCount(9) = cb - 1
     t2 = (Timer - tm)
     'If (t2 * 3) > 1 Then Exit Do
     End If
     Exit Do ' execute loop only once
     Loop
     cb = cb - 1
     createComboList = cb ' return the ubound of the combo array
     If (cb) <> n Then  ' if we skip a level due to time
     ReDim Preserve binItem(dm, cb), binLoad(cb)
     End If
     Debug.Print " Total combinations"; dm; "out of"; totalFiles; "= " & n & ", use approx. "; Format((n * (8 + (samples * 2))) / 1024, "#,###"); " KB of RAM. " & cb & " combos took " & (Timer - tm) & " sec"
     End Function
     
     
     Function combo(ByVal n As Long, ByVal r As Long) As Long
     Dim a As Double, b As Long, i As Long
     If r > 0 And n > r Then
     'C(n,r) = n! / (r! ( n - r )! )
         a = 1
         b = 1
         For i = 1 To r
             a = a * n
             n = n - 1
             b = b * i
         Next
         combo = a / b
     
     End If
     End Function
'''

Upvotes: 0

Sarah Happy
Sarah Happy

Reputation: 283

I recently ran an experiment to find a formula to do a similar filling estimate on dvds, and found a simple formula given some assumptions... from your original post this formula will likely be a low number for you, it sounds like you have multiple directories and longer file names.

Assumptions:

  • all the files are exactly 8.3 characters.
  • all the files are in the root directory.
  • no extensions such as Joliet.

The formula:

174 + floor(count / 42) + sum( ceil(file_size / 2048) )
  • count is the number of files
  • file_size is each file's size in bytes
  • the result is in 2048 byte blocks.

An example script:

#!/usr/bin/perl -w
use strict;
use POSIX;

sub sum {
    my $out = 0;
    for(@_) {
        $out += $_;
    }
    return $out;
}

my @sizes = ( 2048 ) x 1000;
my $file_count = @sizes;

my $data_size = sum(map { ceil($_ / 2048) } @sizes);
my $dir_size = floor( $file_count / 42 ) + 1;
my $overhead = 173;

my $size = $overhead + $dir_size + $data_size;

$\ = "\n";
print $size;

I verified this on disks with up to 150k files, with sizes ranging from 200 bytes to 1 MiB.

Upvotes: 1

j_random_hacker
j_random_hacker

Reputation: 51226

Thanks for the detailed update. I'm satisfied that your current bin-packing strategy is pretty efficient.

As to the question, "Exactly how much overhead does an ISO 9660 filesystem pack on for n files totalling b bytes?" there are only 2 possible answers:

  1. Someone has already written an efficient tool for measuring exactly this. A quick Google search turned up nothing however which is discouraging. It's possible someone on SO will respond with a link to their homebuilt tool, but if you get no more responses for a few days then that's probably out too.
  2. You need to read the readily available ISO 9660 specs and build such a tool yourself.

Actually, there is a third answer:

(3) You don't really care about using every last byte on each DVD. In that case, grab a small representative handful of files of different sizes (say 5), pad them till they are multiples of 2048 bytes, and put all 2^5 possible subsets through genisoimage -print-size. Then fit the equation nx + y = iso_size - total_input_size on that dataset, where n = number of files in a given run, to find x, which is the number of bytes of overhead per file, and y, which is the constant amount of overhead (the size of an ISO 9660 filesystem containing no files). Round x and y up and use that formula to estimate your ISO filesystem sizes for a given set of files. For safety, make sure you use the longest filenames that appear anywhere in your collection for the test filenames, and put each one under a separate directory hierarchy that is as deep as the deepest hierarchy in your collection.

Upvotes: 2

Norman Ramsey
Norman Ramsey

Reputation: 202505

Nice thinking, J. Random. Of course I don't need every last byte, this is mostly for fun (and bragging rights at lunch). I want to be able to type du at the CD-ROM and have it very close to 4700000000.

I looked at the ECMA spec but like most specs it's medium painful and I have no confidence in my ability to get it right. Also it appears not to discuss Rock Ridge extensions, or if it does, I missed it.

I like your idea #3 and think I will carry it a bit further: I'll try to build a fairly rich model of what's going on and then use genisoimage -print-size on a number of filesets to estimate the parameters of the model. Then I can use the model to do my estimation. This is a hobby project so it will take a while, but I will get around to it eventually. I will post an answer here to say how much wastage is eliminated!

Upvotes: 0

j_random_hacker
j_random_hacker

Reputation: 51226

I'm not sure exactly how you are currently doing this -- according to my googling, "Bubblesearch" refers to a way to choose an ordering of items that is in some sense near a greedy ordering, but in your case, the order of adding files to a DVD does not change the space requirements so this approach wastes time considering multiple different orders that amount to the same set of files.

In other words, if you are doing something like the following to generate a candidate file list:

  1. Randomly shuffle the list of files.
  2. Starting from the top of the list, greedily choose all files that you estimate will fit on a DVD until no more will.

Then you are searching the solution space inefficiently -- for any final candidate set of n files, you are potentially considering all n! ways of producing that set. My suggestion:

  1. Sort all files in decreasing order of file size.
  2. Mark the top (largest) file as "included," and remove it from the list. (It must be included on some DVD, so we might as well include it now.)
  3. Can the topmost file in the list be included without the (estimated) ISO filesystem size exceeding the DVD capacity? If so:
    • With probability p (e.g. p = 0.5), mark the file as "included".
  4. Remove the topmost file from the list.
  5. If the list is now empty, you have a candidate list of files. Otherwise, goto 3.

Repeat this many times and choose the best file list.

Tyler D's suggestion is also good: if you have ~40000 files totalling ~500Mb, that means an average file size of 12.5Kb. ISO 9660 uses a block size of 2Kb, meaning those files are wasting on average 1Kb of disk space, or about 8% of their size. So packing them together with tar first will save around 8% of space.

Upvotes: 2

Tyler D
Tyler D

Reputation: 141

Can't use tar to store the files on disk? It's unclear if you're writing a program to do this, or simply making some backups.

Maybe do some experimentation and err on the side of caution - some free space on a disk wouldn't hurt.

Somehow I imagine you've already considered these, or that my answer is missing the point.

Upvotes: 1

Related Questions