Dawid Ohia
Dawid Ohia

Reputation: 16435

Many files in one directory?

I develop some PHP project on Linux platform. Are there any disadvantages of putting several thousand images (files) in one directory? This is closed set which won't grow. The alternative would be to separate this files using directory structure based on some ID (this way there would be let's say only 100 in one directory).

I ask this question, because often I see such separation when I look at images URLs on different sites. You can see that directory separation is done in such way, that no more then several hundreds images are in one directory.

What would I gain by not putting several thousand files (of not growing set) in one directory but separating them in groups of e.g. 100? Is it worth complicating things?

UPDATE:

VALUABLE INFORMATION FROM THE ANSWERS:

Why separate many files to different directories:

Upvotes: 7

Views: 2046

Answers (7)

GSP
GSP

Reputation: 3789

I think there is two aspects to this question:

  1. Does the Linux file system that you're using efficiently support directories with thousands of files. I'm not an expert, but I think the newer file systems won't have problems.

  2. Are there performance issues with specific PHP functions? I think direct access to files should be okay, but if you're doing directory listings then you might eventually run into time or memory problems.

Upvotes: 1

Omry Yadan
Omry Yadan

Reputation: 33596

usually the reason for such splitting is file system performance. for a closed set of 5000 files I am not sure it's worth the hassle. I suggest that you try the simple approach of putting all the files in one directory thing, but keep an eye open on the actual time it takes to access the files.

if you see that it's not fast enough for your needs, you can split it like you suggested.

I had to split files myself for performance reasons. in addition I bumped into a 32k files limit per directory when using ext3 over nfs (not sure if it's a limit of nfs or ext3). so that's another reason to split into multiple directories. in any case, try with a single dir and only split if you see it's not fast enough.

Upvotes: 2

Xorlev
Xorlev

Reputation: 8643

If changing the filesystem is an option, I'd recommend moving wherever you store all the images to a ReiserFS filesystem. It is excellent at fast storage/access of lots of small files.

If not, MightyE's response of breaking them into folders is most logical and will increase access times by a considerable margin.

Upvotes: 0

poke
poke

Reputation: 387507

There is no reason to split those files into multiple directories, if you won't expect any filename conflicts and if you don't need to iterate over those images at any point.

But still, if you can think of a suggestive categorization, it's not a bad idea to sort the images a bit, even if it is just for maintenance reasons.

Upvotes: 1

codeholic
codeholic

Reputation: 5848

Several thousand images are still okay. When you access a directory, operating systems reads the listing of its files by blocks of 4K. If you have plain directory structure, it may take time to read the whole file listing if there are many (e. g. hundred thousand) files in it.

Upvotes: 0

MightyE
MightyE

Reputation: 2679

In addition to faster file access by separating images into subdirectories, you also dramatically extend the number of files you can track before hitting the natural limits of the filesystem.

A simple approach is to md5() the file name, then use the first n characters as the directory name (eg, substr(md5($filename), 2)). This ensures a reasonably even distribution (vs taking the first n characters of the straight filename).

Upvotes: 7

Gordon
Gordon

Reputation: 316939

The only reason I could imagine where it would be detrimental was when iterating over the directory. More files, means more iterations. But that's basically all I can think of from a programming perspective.

Upvotes: 0

Related Questions