Reputation: 202505
I'm archiving data to DVD, and I want to pack the DVDs full. I know the names and sizes of all the files I want on the DVD, but I don't know how much space is taken up by metadata. I want to get as many files as possible onto each DVD, so I'm using a Bubblesearch heuristic with greedy bin-packing. I try 10,000 alternatives and get the best one. Currently I know the sizes of all the files and because I don't know how files are stored in an ISO 9660 filesystem, I add a lot of slop for metadata. I'd like to cut down the slop.
I could use genisoimage -print-size
except it is too slow---given 40,000 files occupying 500MB, it takes about 3 seconds. Taking 8 hours per DVD is not in the cards. I've modified the genisoimage
source before and am really not keen to try to squeeze the algorithm out of the source code; I am hoping someone knows a better way to get an estimate or can point me to a helpful specification.
Clarifying the problem and the question:
I need to burn archives that split across multiple DVDs, typically around five at a time. The problem I'm trying to solve is to decide which files to put on each DVD, so that each DVD (except the last) is as full as possible. This problem is NP-hard.
I'm using the standard greedy packing algorithm where you place the largest file first and you put it in the first DVD having sufficient room. So j_random_hacker, I am definitely not starting from random. I start from sorted and use Bubblesearch to perturb the order in which the files are packed. This procedure improves my packing from around 80% of estimated capacity to over 99.5% of estimated capacity. This question is about doing a better job of estimating the capacity; currently my estimated capacity is lower than real capacity.
I have written a program that tries 10,000 perturbations, each of which involves two steps:
Step 2 is the step I'm trying to improve. At present I am "erring on the side of caution" as Tyler D suggests. But I'd like to do better. I can't afford to use genisomage -print-size
because it's too slow. Similarly, I can't tar the files to disk, because on only is it too slow, but a tar file is not the same size as an ISO 9660 image. It's the size of the ISO 9660 image I need to predict. In principle this could be done with complete accuracy, but I don't know how to do it. That's the question.
Note: these files are on a machine with 3TB of hard-drive storage. In all cases the average size of the files is at least 10MB; sometimes it is significantly larger. So it is possible that genisomage
will be fast enough after all, but I doubt it---it appears to work by writing the ISO image to /dev/null, and I can't imagine that will be fast enough when the image size approaches 4.7GB. I don't have access to that machine right now, or when I posted the original question. When I do have access in the evening I will try to get better numbers for the question. But I don't think genisomage
is going to be a good solution---although it might be a good way to learn a model of the filesystem
that tells me how how it works. Knowing that block size is 2KB is already helpful.
It may also be useful to know that files in the same directory are burned to the samae DVD, which simplifies the search. I want access to the files directly, which rules out tar-before-burning. (Most files are audio or video, which means there's no point in trying to hit them with gzip
.)
Upvotes: 7
Views: 3905
Reputation: 109
I have to save SQL backups onto DVDs ranging from 200KB to 3GB. I only have about 30 files to do every week. So, using MS-Access I created this module.
I calculate a combination of files and their sizes then pick the best combo by filling the DVD sequentially or up and down.
Option Compare Database
Option Explicit
' arranging SQL back files to fit on as fewest DVD as possible is a bin packing problem
' This is a bin packing problem - no one best solution - a NP Hard problem!
'
'DVD-R single layer
'2,298,496 sectors, 4,707,319,808 longs
'subtract 10MB overhead = 4,696,834,048 longs
Public Const DVD_CAPACITY As Double = 4500000000# '' rounding down to a conservative number of bytes per DVD disc (4.37GB ~ 4,692,251,771)
'' make list of files per disc
Function makeList(wk As Long)
Const KB As Double = 1024# '' a KB in longs
Dim rs As DAO.Recordset, txt As String '' retrieve the data from the Access table
Dim tfs As Long, filSz As Double, totsz As Double ''total files, tmp file size, total size of files
Dim bLb As Long '' bins lower bound
Dim discAvg As Double, dvdMax As Double '' setting a max size per disc
Dim itr As Long, seq As Long, best As Long '' iteration counter of the main loop, bin sequencer, best iteration
Dim inc As Long, xtr As Long '' incremental per loop ,extra files not packed in bin (remainder)
Dim bPtr As Long, fPtr As Long '' bin/disc pointer, file pointer
Dim i As Long, j As Long, col As Long, tm As Single '' general purpose counter, array pointer file, time keeper
Dim smpl(10) As Long, updn(10) As Boolean '' setup for each iteration
Dim files As Variant '' list of files from recordset
Dim fAlloc() As Long '' into which bin is each file allocated
Dim maxS(10) As Double, remd(10) As Long '' track bin size and file remaining after each iteration
Dim discPtr() As Long '' array pointer to the next empty discFile() slot
Dim discSize() As Double '' total file size allocated in each disc, size loaded
Dim discFile() As Long '' the bins to pack, array pointers to files() for each file in each disc
Dim discSpace As Double
Dim tCbo As Long, comboMax(9) As Long, comboUsed() As Boolean, combos() As Integer, comboSize() As Double '' calculated combinations - store fileID and combo size, total combos
tm = Timer
smpl(0) = 7: smpl(1) = 7: smpl(2) = 6: smpl(3) = 6: smpl(4) = 5: smpl(5) = 5: smpl(6) = 4: smpl(7) = 4: smpl(8) = 3: smpl(9) = 3
updn(0) = 0: updn(1) = -1: updn(2) = 0: updn(3) = -1: updn(4) = 0: updn(5) = -1: updn(6) = 0: updn(7) = -1: updn(8) = 0: updn(9) = -1
''////// GET FILE LIST FOR THE WEEK WE ARE BACKING UP
'' get a list of files sorted by size descending
txt = "SELECT [fileid],[filesz] from [files_per_week] where ([weekid]=" & wk & ") order by [filesz] desc;"
'',[filename],[filedt] - not required.
Set rs = CurrentDb.OpenRecordset(txt, dbOpenDynaset)
files = rs.GetRows(200)
'' files( column, row )
'' columns: 0[fileid] long, 1[filesz] double - is all we need
rs.Close
tfs = UBound(files, 2) '' tfs + 1 = total files
ReDim fAlloc(tfs) '' file to bin allocation
For i = 0 To tfs '' for each file
filSz = files(1, i) / 2048# '' how many sectors
'' adjust file size for DVD (blocks of 2048 + overhead) and add them up
If (filSz - Int(filSz)) > 0 Then
filSz = Int(filSz) + 2 '' add partial + VAT descriptor
Else
filSz = Int(filSz) + 1
End If
files(1, i) = filSz * 2048#
totsz = totsz + files(1, i)
Next
bLb = Int(totsz / DVD_CAPACITY) + 1 '' minimum/optimal number of bins/disc
discAvg = Int(totsz / bLb) + 4 '' optimum avg size per disc (nearest word)
dvdMax = discAvg + (KB * KB) '' start with 1MB of slack files(1, tfs) '' add the smallest file to avg; appears to create a better distribution
If dvdMax > DVD_CAPACITY Then
dvdMax = DVD_CAPACITY '' dont go beyond that limit
End If
Debug.Print "Total file sizes " & Format(totsz / KB, "#,###") & " KB for week " & wk & " Optimum num of Bins " & bLb
Debug.Print " DVD Size = " & Format(DVD_CAPACITY / KB, "#,###") & " KB. "
Debug.Print " Avg / disc " & Format(discAvg / KB, "#,###") & " KB. "
Debug.Print " Set at max " & Format(dvdMax / KB, "#,###") & " KB."
''//////// START PACKING BINS
'' set number of bin to blb - can redim last dimension to higher # of bin
bLb = bLb - 1 '' shift for array pointing
ReDim discFile(tfs, bLb) '' discfile( spot, bin) to ID of item - discfile for fileID for each file on disc
ReDim discPtr(bLb) '' discPtr( bin) to next empty spot - pointer to the next available space on the packing list
ReDim discSize(bLb) '' discsize (bin) load of bin - total filesize loaded on disc
'' TRY 10 DIFFERENT COMBOS AND FILLING SCENARIOS
best = -1
For itr = 0 To 10
If itr = 10 Then
Debug.Print "Final Iteration."
End If
'' RESET DISC MANIFEST AND FILE ALLOCATION ARRAYS
For j = 0 To bLb
For i = 0 To tfs
discFile(i, j) = -1
fAlloc(i) = -1 '' gets repeated - not a huge overhead
Next
discPtr(j) = 0
discSize(j) = 0
Next
'' PLACE A FILE IN EACH BIN STARTING WITH LARGEST ITEM, ordered largest to smallest.
fPtr = 0 '' file count/pointer
For bPtr = 0 To bLb '' for each bin
'' Do While discSize(bPtr) < (dvdMax / 2) '' until filled at 50% or more - not that efficient use of space
discFile(discPtr(bPtr), bPtr) = fPtr
discPtr(bPtr) = discPtr(bPtr) + 1 '' point to next avail slot
discSize(bPtr) = discSize(bPtr) + files(1, fPtr)
fAlloc(fPtr) = i
fPtr = fPtr + 1
'' Loop
Next
'' CREATE A LIST OF FILE COMBINATIONS (UP TO 7) TO MAKE A BEST FIT
If tCbo = 0 Then
'' do once - number of files will not change
tCbo = createComboList(fPtr, tfs, 7, combos, comboSize, files, comboMax) '' (tot files lbound, ubound, max sample, 4 arrays)
ReDim comboUsed(tCbo)
'' '' check max sizes
'' For i = 2 To 9
'' j = comboMax(i - 1) + 1
'' ''If j < fPtr Then j = fPtr
'' If j >= fPtr And j <= tCbo Then
'' Debug.Print i, Format(comboSize(j), "#,###")
'' End If
'' Next
Else
'' reset the comboUsed
For i = 0 To comboMax(smpl(itr))
comboUsed(i) = False
Next
End If
'' CONTINUE FILLING THE BINS
col = UBound(combos) '' find the combos number of columns (sample size)
inc = 0 '' files added per bin sweep.
If updn(itr) = True Then '' we go up and down the bins: 0,1,2,2,1,0,0,1,2...
bPtr = 0 '' start at the last bin
seq = -1 '' go down
Else
bPtr = 0 '' start at the first bin
seq = 1 '' initial bin step - either 1 or -1
End If
Do ''repeat for each bin - bin step determined below.
discSpace = dvdMax - discSize(bPtr) '' remaining space
filSz = files(1, tfs) - 2048#: fPtr = -1 '' temp size var (no combo is smaller than smallest file); f = combo found
'' FIND A SET OF FILES TO PACK NEXT
For i = LBound(comboSize) To comboMax(smpl(itr)) '' check all combos
If comboUsed(i) = False Then
'' theres room for this combo
If comboSize(i) < discSpace And comboSize(i) > filSz Then '' is this the largest so far
'' is any file id in that combo allready allocated
For j = 0 To col
If combos(j, i) > -1 Then
'' we have a file pointer
If fAlloc(combos(j, i)) > -1 Then
'' that file was loaded in a bin
comboUsed(i) = True '' remove combofrom future search
Exit For
End If
End If
Next
If comboUsed(i) = False Then '' if all files of this combo are available
filSz = comboSize(i) '' save that size
fPtr = i '' save the ptr to the combo
End If '' else continue search
End If
End If
Next
'' PUT FILES IN BIN
'' combos(x,fPtr),comboSize(fPtr) will have the best combo to put in bin
If fPtr > -1 Then '' found a file combo
'' put each file in the bin
For j = 0 To col
If combos(j, fPtr) > -1 Then
'' we have a fileID
discFile(discPtr(bPtr), bPtr) = combos(j, fPtr) '' load file pointer in slot
discSize(bPtr) = discSize(bPtr) + files(1, combos(j, fPtr)) '' add size to load
fAlloc(combos(j, fPtr)) = bPtr '' file was allocated a bin
discPtr(bPtr) = discPtr(bPtr) + 1 '' point to next slot
inc = inc + 1 '' count the number of fill action
End If
Next
'' this combo was used
comboUsed(fPtr) = True 'remove from search
End If
'' ANY REMAINING FILES TO PACK?
xtr = 0
For i = 0 To tfs
If fAlloc(i) < 0 Then
xtr = xtr + 1 '' a file remains unpacked
End If
Next
'' MOVE TO NEXT BIN - which bin/disc to point to next
bPtr = bPtr + seq '' next bin
If bPtr > bLb Then '' if ptr past last bin
If updn(itr) = True Then '' we go up and down the bins: 0,1,2,2,1,0,0,1,2...
seq = seq * -1 '' reverse the bin order
bPtr = bPtr + seq
Else ' go thru the bins 0,1,2,0,1,2,0,1,2...
bPtr = 0 '' back to bin 0
End If
ElseIf bPtr < 0 Then
seq = seq * -1 '' reverse the bin order
bPtr = bPtr + seq
End If
'' if we're back to bin 0
If bPtr = 0 Then
If inc = 0 And xtr > 0 Then
' if no files added this round and there is 1+ file remaining
Exit Do '' exit loop with xtr file remaining
Else
inc = 0 '' reset for another round
End If
End If
DoEvents
If (Timer - tm) > 90 Then Stop '' we have a runaway loop
Loop While xtr > 0
filSz = 0
'' STORE RESULT OF THIS ITERATION
For i = 0 To bLb
If discSize(i) > filSz Then filSz = discSize(i)
Next
maxS(itr) = filSz
remd(itr) = xtr
'' '' get smallest disc size
'' For i = 0 To bLb
'' If discSize(i) < filSz Then filSz = discSize(i)
'' Next
'' FINAL ITERATION DONE - PREPARE FOR LAST ITERATION
If itr = 9 Then ' last iteration completed - prep for final pass or reset
'' Debug.Print " iteration results "
'' For i = 0 To 9
'' Debug.Print i & " " & smpl(i), IIf(updn(i), "Up/Dn", "Seq"), maxS(i), remd(i)
'' Next
'' FIND THE BEST COMBO - max slack space
filSz = DVD_CAPACITY
For i = 0 To 9
'' pick a successful combo that has the most space remaining on disc
If maxS(i) <= filSz And remd(i) = 0 Then 'with no file remaining
filSz = maxS(i)
smpl(10) = smpl(i)
updn(10) = updn(i)
best = i
End If
Next
'' if not successful add to size or bin.
If smpl(10) = 0 Then
'' need to increase size available
dvdMax = Int(dvdMax * 1.025) ' add 2.5% of the free space to the operating max
If dvdMax > DVD_CAPACITY Then 'if we go over this limit
bLb = bLb + 1 ' adding one bin (bLb is 0 based)
ReDim discFile(tfs, bLb) '' discfile for fileID for each file on disc
ReDim discPtr(bLb) '' pointer to the next available space on the packing list
ReDim discSize(bLb) '' total filesize loaded on disc
Debug.Print "*** ADDING A DISC ***"
discAvg = Int(totsz / (bLb + 1)) + 4 '' optimum avg size per disc (nearest word)
dvdMax = discAvg + (2 * KB * KB) '' start with 2MB of slack
Else
Debug.Print " - Trying a higher Operating max: " & Format(dvdMax / KB, "#,###") & " KB.";
''discSpace = dvdMax - discAvg slack space available per disc
''Debug.Print " Slack space " & Format(discSpace / KB, "#,###") & " KB."
End If
itr = -1 '' reset the iteration counter
End If
End If
'' '' TROUBLESHOOTHING
'' If itr > -1 Then
'' For i = 0 To bLb
'' Debug.Print "Try " & itr & " c" & smpl(itr) & updn(itr) & " Disc " & i & ", " & Format(discSize(i), "#,###") & " B. " & discPtr(i) & " fileID ";
'' '' print list of files added this round
'' For j = 0 To UBound(discFile)
'' If discFile(j, i) < 0 Then Exit For
'' Debug.Print ","; discFile(j, i);
'' Next
'' Debug.Print
'' Next
'' End If
'' Debug.Print "Try " & itr & " remains " & xtr & " slack " & Format(discSpace, "#,###")
Next
'' Debug.Print " Iteration results"
'' For i = 0 To 10
'' Debug.Print i & " " & smpl(i), IIf(updn(i), "Up/Dn", "Seq"), maxS(i), remd(i)
'' Next
Debug.Print " Best iteration: #" & best & " - combo " & smpl(10) & "/" & tfs, IIf(updn(10), "Up/Dn", "Seq")
txt = ""
For i = 0 To bLb
Debug.Print "Disc " & i & ", " & Format(discSize(i), "#,###") & " B. " & discPtr(i) & " file sequence(ID) "
txt = txt & discPtr(i) & ", "
'' ' print list of files added to this disc
'' For j = 0 To UBound(discFile)
'' If discFile(j, i) < 0 Then Exit For
'' Debug.Print ","; discFile(j, i); ''files(0, discFile(h, i));
'' Next
'' Debug.Print
Next
CurrentDb.Execute "DELETE * FROM comboEfficiency WHERE weekID = " & wk
CurrentDb.Execute "INSERT INTO comboEfficiency (weekID, calcDT, totFiles, bestSample, binVariation, discAvg, discMax, perDisc)" & _
"VALUES(" & wk & ",#" & Now() & "#," & (tfs + 1) & "," & smpl(10) & ",'" & IIf(updn(10), "Up and Down", "Sequential") & "'," & discAvg & "," & maxS(10) & ",'" & Left(txt, Len(txt) - 2) & "')"
Debug.Print "Print and Save manifest for week " & wk
txt = "SELECT * FROM manifest"
CurrentDb.Execute "DELETE * FROM manifest WHERE weekID = " & wk
Set rs = CurrentDb.OpenRecordset(txt, dbOpenDynaset)
'' reuse txt var
txt = ""
For bPtr = 0 To bLb
''Debug.Print
txt = txt & "DICS " & (bPtr + 1) & " Total " & Format(discSize(bPtr) / KB, "#,###") & " KB, with " & discPtr(bPtr) & " files." & vbCrLf
For i = 0 To tfs
If discFile(i, bPtr) > -1 Then
rs.AddNew
rs!weekID = wk ' week no
rs!DVD_no = bPtr ' which bin/disc is file going to
rs!fileID = files(0, discFile(i, bPtr)) ' the fileID from table [files]
rs.Update
End If
Next
Next
rs.Close
MsgBox txt
Debug.Print
Debug.Print
End Function
Function createComboList(ByRef startAtFile As Long, ByRef totalFiles As Long, ByRef samples As Long, ByRef binItem, ByRef binLoad, ByRef srcItems, ByRef comboCount) As Long
'' CREATE A LIST OF FILE COMBINATIONS TO MAKE A BEST FIT
'' startAtFile - lower bound of scrItems array
'' totalFiles - upper bound of scrItems array
'' samples - max number of sample r taken from number of files (totalFiles - startAtFile)
'' binItem - table array containing pointers to srcItems() - one row per combination
'' binLoad - total estimated size for each combination
'' srcItems - table array of files ( fileID, Size)
'' comboCount - count of combos per sample
'' general purpose counter, array pointer file,
Dim a As Long, b As Long, c As Long, d As Long, e As Long, f As Long, g As Long, h As Long, i As Long, j As Long, k As Long, n As Long
Dim dm As Long '' samples 0 based
Dim dc As Long '' sample counter 0 based
Dim t0 As Long '' srcItems lbound
Dim tf As Long '' tot files- ubound
Dim cb As Long '' number of combos
Dim tm As Single, t2 As Single '' time keepers
Dim timelim As Integer ' how many seconds to wait
Dim r As Double
timelim = 2
'' minimum requirement: 2 to 8 samples, and a minimum number of files
If samples < 2 Or samples > 9 Or totalFiles <= samples Then Exit Function
''samples = 7: totalFiles = 32 ' TESTING
tm = Timer
dm = samples - 1
t0 = startAtFile
tf = totalFiles
'' to find the best combination to fill the remaining space make a list of possibilities
cb = tf 'add the single files (combo(tf,1))
''Adding 32 combos, total 33 , took 0.078125 sec.
For dc = 2 To samples
''there is an undocumented hard limit for array around 512MB
If ((cb + combo(tf - t0 + 1, dc)) * dc * 2) > 134217728# Then '' to reduce processing time lets use 128MB
dc = dc - 1
Exit For
End If
cb = cb + combo(tf - t0 + 1, dc) '' how many combinations taken dc at a time
Next
dm = dc - 1
'' Total combinations 7 out of 32 = 4514872, use approx. 96,999 KB of RAM. 4514872 combos took 2.375 sec
ReDim binItem(dm, t0 To cb), binLoad(t0 To cb)
''Debug.Print " Estimate total combinations "; cb; " taken "; dm; " at a time, will use approx. "; Format((cb * ((dm * 2))), "#,###"); " bytes of RAM"
n = cb
cb = t0
'' single file
For a = t0 To tf
binItem(0, cb) = a
For dc = 1 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a)
cb = cb + 1
Next
comboCount(1) = cb - 1
If (Timer - tm) > 1 Then Exit Do
'' combine 2
If dm >= 1 Then
For a = t0 To (tf - 1)
For b = (a + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b
For dc = 2 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b)
cb = cb + 1
Next
Next
comboCount(2) = cb - 1
If (t2 * 3) > timelim Then Exit Do
End If
' combine 3
If dm > 2 Then
For a = t0 To (tf - 2)
For b = (a + 1) To (tf - 1)
For c = (b + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c
For dc = 3 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, b)
cb = cb + 1
Next
Next
Next
comboCount(3) = cb - 1
t2 = (Timer - tm)
If (t2 * 3) > timelim Then Exit Do
End If
' combine 4
If dm > 3 Then
For a = t0 To (tf - 3)
For b = (a + 1) To (tf - 2)
For c = (b + 1) To (tf - 1)
For d = (c + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d
For dc = 4 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d)
cb = cb + 1
Next
Next
Next
Next
comboCount(4) = cb - 1
t2 = (Timer - tm)
If (t2 * 3) > timelim Then Exit Do
End If
' combine 5
If dm > 4 Then
For a = t0 To (tf - 4)
For b = (a + 1) To (tf - 3)
For c = (b + 1) To (tf - 2)
For d = (c + 1) To (tf - 1)
For e = (d + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e
For dc = 5 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e)
cb = cb + 1
Next
Next
Next
Next
Next
comboCount(5) = cb - 1
t2 = (Timer - tm)
If (t2 * 3) > timelim Then Exit Do
End If
' combine 6
If dm > 5 Then
For a = t0 To (tf - 5)
For b = (a + 1) To (tf - 4)
For c = (b + 1) To (tf - 3)
For d = (c + 1) To (tf - 2)
For e = (d + 1) To (tf - 1)
For f = (e + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f
For dc = 6 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f)
cb = cb + 1
Next
Next
Next
Next
Next
Next
comboCount(6) = cb - 1
t2 = (Timer - tm)
If (t2 * 3) > timelim Then Exit Do
End If
' combine 7
If dm > 6 Then
For a = t0 To (tf - 6)
For b = (a + 1) To (tf - 5)
For c = (b + 1) To (tf - 4)
For d = (c + 1) To (tf - 3)
For e = (d + 1) To (tf - 2)
For f = (e + 1) To (tf - 1)
For g = (f + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f: binItem(6, cb) = g
For dc = 7 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f) + srcItems(1, g)
cb = cb + 1
Next
Next
Next
Next
Next
Next
Next
comboCount(7) = cb - 1
t2 = (Timer - tm)
If (t2 * 3) > timelim Then Exit Do
End If
' combine 8
If dm > 7 Then
For a = t0 To (tf - 7)
For b = (a + 1) To (tf - 6)
For c = (b + 1) To (tf - 5)
For d = (c + 1) To (tf - 4)
For e = (d + 1) To (tf - 3)
For f = (e + 1) To (tf - 2)
For g = (f + 1) To (tf - 1)
For h = (g + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f: binItem(6, cb) = g: binItem(7, cb) = h
For dc = 8 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f) + srcItems(1, g) + srcItems(1, h)
cb = cb + 1
Next
Next
Next
Next
Next
Next
Next
Next
comboCount(8) = cb - 1
t2 = (Timer - tm)
If (t2 * 3) > timelim Then Exit Do
End If
' combine 9
If dm > 8 Then
For a = t0 To (tf - 8)
For b = (a + 1) To (tf - 7)
For c = (b + 1) To (tf - 6)
For d = (c + 1) To (tf - 5)
For e = (d + 1) To (tf - 4)
For f = (e + 1) To (tf - 3)
For g = (f + 1) To (tf - 2)
For h = (g + 1) To (tf - 1)
For i = (h + 1) To tf
binItem(0, cb) = a: binItem(1, cb) = b: binItem(2, cb) = c: binItem(3, cb) = d: binItem(4, cb) = e: binItem(5, cb) = f: binItem(6, cb) = g: binItem(7, cb) = h: binItem(8, cb) = i
For dc = 9 To dm: binItem(dc, cb) = -1: Next
binLoad(cb) = srcItems(1, a) + srcItems(1, b) + srcItems(1, c) + srcItems(1, d) + srcItems(1, e) + srcItems(1, f) + srcItems(1, g) + srcItems(1, h) + srcItems(1, i)
cb = cb + 1
Next
Next
Next
Next
Next
Next
Next
Next
Next
comboCount(9) = cb - 1
t2 = (Timer - tm)
'If (t2 * 3) > 1 Then Exit Do
End If
Exit Do ' execute loop only once
Loop
cb = cb - 1
createComboList = cb ' return the ubound of the combo array
If (cb) <> n Then ' if we skip a level due to time
ReDim Preserve binItem(dm, cb), binLoad(cb)
End If
Debug.Print " Total combinations"; dm; "out of"; totalFiles; "= " & n & ", use approx. "; Format((n * (8 + (samples * 2))) / 1024, "#,###"); " KB of RAM. " & cb & " combos took " & (Timer - tm) & " sec"
End Function
Function combo(ByVal n As Long, ByVal r As Long) As Long
Dim a As Double, b As Long, i As Long
If r > 0 And n > r Then
'C(n,r) = n! / (r! ( n - r )! )
a = 1
b = 1
For i = 1 To r
a = a * n
n = n - 1
b = b * i
Next
combo = a / b
End If
End Function
'''
Upvotes: 0
Reputation: 283
I recently ran an experiment to find a formula to do a similar filling estimate on dvds, and found a simple formula given some assumptions... from your original post this formula will likely be a low number for you, it sounds like you have multiple directories and longer file names.
Assumptions:
The formula:
174 + floor(count / 42) + sum( ceil(file_size / 2048) )
An example script:
#!/usr/bin/perl -w
use strict;
use POSIX;
sub sum {
my $out = 0;
for(@_) {
$out += $_;
}
return $out;
}
my @sizes = ( 2048 ) x 1000;
my $file_count = @sizes;
my $data_size = sum(map { ceil($_ / 2048) } @sizes);
my $dir_size = floor( $file_count / 42 ) + 1;
my $overhead = 173;
my $size = $overhead + $dir_size + $data_size;
$\ = "\n";
print $size;
I verified this on disks with up to 150k files, with sizes ranging from 200 bytes to 1 MiB.
Upvotes: 1
Reputation: 51226
Thanks for the detailed update. I'm satisfied that your current bin-packing strategy is pretty efficient.
As to the question, "Exactly how much overhead does an ISO 9660 filesystem pack on for n files totalling b bytes?" there are only 2 possible answers:
Actually, there is a third answer:
(3) You don't really care about using every last byte on each DVD. In that case, grab a small representative handful of files of different sizes (say 5), pad them till they are multiples of 2048 bytes, and put all 2^5 possible subsets through genisoimage -print-size
. Then fit the equation nx + y = iso_size - total_input_size on that dataset, where n = number of files in a given run, to find x, which is the number of bytes of overhead per file, and y, which is the constant amount of overhead (the size of an ISO 9660 filesystem containing no files). Round x and y up and use that formula to estimate your ISO filesystem sizes for a given set of files. For safety, make sure you use the longest filenames that appear anywhere in your collection for the test filenames, and put each one under a separate directory hierarchy that is as deep as the deepest hierarchy in your collection.
Upvotes: 2
Reputation: 202505
Nice thinking, J. Random. Of course I don't need every last byte, this is mostly for fun (and bragging rights at lunch). I want to be able to type du
at the CD-ROM and have it very close to 4700000000.
I looked at the ECMA spec but like most specs it's medium painful and I have no confidence in my ability to get it right. Also it appears not to discuss Rock Ridge extensions, or if it does, I missed it.
I like your idea #3 and think I will carry it a bit further: I'll try to build a fairly rich model of what's going on and then use genisoimage -print-size
on a number of filesets to estimate the parameters of the model. Then I can use the model to do my estimation. This is a hobby project so it will take a while, but I will get around to it eventually. I will post an answer here to say how much wastage is eliminated!
Upvotes: 0
Reputation: 51226
I'm not sure exactly how you are currently doing this -- according to my googling, "Bubblesearch" refers to a way to choose an ordering of items that is in some sense near a greedy ordering, but in your case, the order of adding files to a DVD does not change the space requirements so this approach wastes time considering multiple different orders that amount to the same set of files.
In other words, if you are doing something like the following to generate a candidate file list:
Then you are searching the solution space inefficiently -- for any final candidate set of n files, you are potentially considering all n! ways of producing that set. My suggestion:
Repeat this many times and choose the best file list.
Tyler D's suggestion is also good: if you have ~40000 files totalling ~500Mb, that means an average file size of 12.5Kb. ISO 9660 uses a block size of 2Kb, meaning those files are wasting on average 1Kb of disk space, or about 8% of their size. So packing them together with tar first will save around 8% of space.
Upvotes: 2
Reputation: 141
Can't use tar to store the files on disk? It's unclear if you're writing a program to do this, or simply making some backups.
Maybe do some experimentation and err on the side of caution - some free space on a disk wouldn't hurt.
Somehow I imagine you've already considered these, or that my answer is missing the point.
Upvotes: 1