Reputation: 1300
I'm trying to determine the best way to store large numbers of small .mat files, around 9000 objects with sizes ranging from 2k to 100k, for a total of around half a gig.
The typical use case is that I only need to pull a small number (say 10) of the files from disk at a time.
What I've tried:
Method 1: If I save each file individually, I get performance problems (very slow save times and system sluggishness for some time after) as Windows 7 has difficulty handling so may files in a folder (And I think my SSD is having a rough time of it, too). However, the end result is fine, I can load what I need very quickly. This is using '-v6' save.
Method 2: If I save all of the files in one .mat file and then load just the variables I need, access is very slow (loading takes around three quarters of the time it takes to load the whole file, with small variation depending on the ordering of the save). This is using '-v6' save, too.
I know I could split the files up into many folders but it seems like such a nasty hack (and won't fix the SSD's dislike of writing many small files), is there a better way?
Edit: The objects are consist mainly of a numeric matrix of double data and an accompanying vector of uint32 identifiers, plus a bunch of small identifying properties (char and numeric).
Upvotes: 4
Views: 1303
Reputation: 1300
The solution I have come up with is to save object arrays of around 100 of the objects each. These files tend to be 5-6 meg so loading is not prohibitive and access is just a matter of loading the right array(s) and then subsetting them to the desired entry(ies). This compromise avoids writing too many small files, still allows for fast access of single objects and avoids any extra database or serialization overhead.
Upvotes: 0
Reputation: 20560
Five ideas to consider:
save
to save objects in two places).Update: The OP has mentioned custom objects. There are two methods to consider for serializing these:
Upvotes: 2
Reputation: 22914
Try storing them as blobs in a database.
I would also try the multiple folders method as well - it might perform better than you think. It might also help with organization of the files if that's something you need.
Upvotes: 1