Reputation: 11
Hey all. I'm creating an application that is going to be generating and storing millions of images. Before I start on this, I'm wondering if anyone knows if it's better to generate more folders and only keep a few files in each, or should I use a few folders and fill them up with lots of files?
The generator will be written in C++ and the files will be accessed directly via GET requests.
Thanks, Steve
Upvotes: 1
Views: 2251
Reputation: 1
@dmckee No clicks, as the images all load automatically. Think mapping software.
@Brian Agnew It will run/served on some sort of Linux cloud thing. I'm not an IT guy by any stretch of the imagination, just the programmer. But it will definitely be scaled out to a bunch of machines.
@Onkelborg I concur. My inclination has been to go with more folders and less files, as well. I'm thinking the layout would be something like...
set/zoom-level/column/row.jpg
I wanted to use filename/directory structure to pull files without querying a server. If we're zoomed in by a factor of five and the top left coordinate is 25,600 x 15,360 of this larger image, given a 256 pixel square tile, some basic math would give me this URL:
2389/5/20/12.jpg
Where "2389" is a tile-set ID. So you can see images would only be stored in directories three levels deep. The directories with images would hold maybe 4 - ~100 images based on zoom level. Or maybe a dozen to a few hundred (with slightly less folders), if went this way...
set/zoom-level/row/column.jpg
I came across a similar system that used a similar quad tree system and notice that they had to break out into new folders at odd, non-systemic spots that made me think they did it for performance issues or other limitations.
As I've written this, I think I'm realize that the first layout is probably the way to go. It's less items to iterate through to find requested file. I'm just thinking of fragmentation, but I guess that will be IT's job. ;)
Upvotes: 0
Reputation: 101171
Things that come to mind:
Pro "fewer folders"
Pro "more folders":
Optimizing against these competing pressures will depend on knowing what a typical use pattern looks like, which means you may have to guess initially.
But just for convenient display on the screen I'd suggest more than a handful and fewer than a hundred entries per directory. Then you can collect statistics and adjust from there.
Upvotes: 0
Reputation: 272217
As ever, you need to run some tests with various scenarios on your particular deployment platform. Note that you've not mentioned which OS/filesystem etc. that you're running on.
I would generally implement some balance between a deeply nested hierarchy (fast but difficult to manage, possibly), and a flat hierarchy with everything stored in one directory. This latter case has caused me performance problems on most platforms in the past. How much data you need to store and how performant you need your solution will dictate how you structure your directories, and some experimentation will give you pointers here.
Upvotes: 0
Reputation: 3997
In terms of speed, manageability etc: go with more folders. If you examine a few big applications, generally, they split up the files in many folders. Most applications and/or file systems doesn't like too many files in one folder. From a programmers point of view, it doesn't matter.
Upvotes: 2