ThinkingMonkey
ThinkingMonkey

Reputation: 12727

readdir vs scandir

1] Which of the functions is faster?
2] what are the differences?

Differences

1] readdir returns the name of the next entry in the directory. Scandir returns an array of files and directories from the directory.

2] readdir has to have a resource handle open until all the entries are read. scandir, perhaps creates an array of all the entries and closes the resouce handle?

Upvotes: 36

Views: 31731

Answers (5)

Vitaly
Vitaly

Reputation: 651

I know this question may be not actual now, but to append I have done some tests (like Aufziehvogel and Sayahan) with a small difference - on a directory with 1,000,000 small (a few bytes) files.

$dir = dirname(__FILE__) . '/dir';

$startScan = microtime(true);
$array = scandir($dir);
for ($i = 0, $j = count($array); $i < $j; $i++) {
    // Code
}
$endScan = microtime(true);
unset($array);

$startRead = microtime(true);
$handle = opendir($dir);
while (false !== ($entry = readdir($handle))) {
    // Code
}
$endRead = microtime(true);
unset($handle);
unset($entry);

$startDir = microtime(true);
$files = new DirectoryIterator($dir);
foreach ($files as $file) {
    // Code
}
$endDir = microtime(true);
unset($files);

echo 'scandir:           ', ($endScan - $startScan), PHP_EOL;
echo 'readdir:           ', ($endRead - $startRead), PHP_EOL;
echo 'DirectoryIterator: ', ($endDir - $startDir), PHP_EOL;

Results (HDD):

scandir:           1.9403479099274
readdir:           0.79462885856628
DirectoryIterator: 0.5853099822998

Results (SSD):

scandir:           0.83593201637268
readdir:           0.35835003852844
DirectoryIterator: 0.28022909164429

CPU: AMD A10-4600M APU with Radeon(tm) HD Graphics (4 cores)
MEM: 8G
PHP: 5.6.29

Upvotes: 10

Sayahan
Sayahan

Reputation: 41

I have done some tests. (Thanks to Aufziehvogel for construction)

$count = 100000;

$dir = dirname(__FILE__);

$startScan = microtime(true);
for ($i=0;$i<$count;$i++) {
    $array = scandir($dir);
}
$endScan = microtime(true);

$startRead = microtime(true);
for ($i=0;$i<$count;$i++) {
    $handle = opendir($dir);
    while (false !== ($entry = readdir($handle))) {
        // We do not know what to do                    
    }
}
$endRead = microtime(true);

$startGlob = microtime(true);
for ($i=0;$i<$count;$i++) {
    $array3 = glob('*');
}
$endGlob = microtime(true);

echo "scandir: " . ($endScan-$startScan) . "\n";
echo "readdir: " . ($endRead-$startRead) . "\n";
echo "glob   : " . ($endGlob-$startGlob) . "\n";

Linux Server Results:

scandir: 0.82553291320801
readdir: 0.91677618026733
glob   : 0.76309990882874

This Reasults from 4 cores (8 Threads) intel E3-1240 Cpu linux + Apache server.

But Windows Servers results is opposite. Windows + Apache server - Intel Q8400 4 Core (4 Threads)

Windows Server Results:

$count = 10000; // it was on linux 100000 :)

scandir: 0.61557507515
readdir: 0.614650011063
glob   : 1.92112612724

(Folder includes 13 files. If files is increase, results can be different)

Upvotes: 4

Erik Liljencrantz
Erik Liljencrantz

Reputation: 71

Did some more timing comparisons for reading an entire directory tree with plenty of files and directories:

  • the call filetype()=="dir" is clearly faster than the is_dir() call

  • the opendir/readdir calls are much faster than the RecursiveDirectoryIterator

  • building the directory tree using recursive calls depth first or linear makes no difference

The above tests where performed in Windows on local SSD, local USB and network drive with consistent results. Running on the network drive was up to 180 times slower than local drives - despite gigabit and otherwise fast ReadyNAS unit!

The number of entries handled per second ranged from 115 with the slowest code to the network drive to almost 65 000 for the fastest code to the USB 3.0 drive - due to caching of course.

But the huge difference for the network drive makes You wonder what happens inside PHP as the simple dir command and ls in Linux over the same files is much quicker.

To be continued...

Upvotes: 3

aufziehvogel
aufziehvogel

Reputation: 7297

Just getting the results (without doing anything), readdir is a minimum faster:

<?php

$count = 10000;

$dir = '/home/brati';

$startScan = microtime(true);
for ($i=0;$i<$count;$i++) {
    $array = scandir($dir);
}
$endScan = microtime(true);


$startRead = microtime(true);
for ($i=0;$i<$count;$i++) {
    $handle = opendir($dir);
    while (false !== ($entry = readdir($handle))) {
        // We do not know what to do
    }
}
$endRead = microtime(true);

echo "scandir: " . ($endScan-$startScan) . "\n";
echo "readdir: " . ($endRead-$startRead) . "\n";

Gives:

== RUN 1 ==
scandir: 5.3707950115204
readdir: 5.006147146225

== RUN 2 ==
scandir: 5.4619920253754
readdir: 4.9940950870514

== RUN 3 ==
scandir: 5.5265231132507
readdir: 5.1714680194855

Then of course it depends on what you intend to do. If you have to write another loop with scandir(), it will be slower.

Upvotes: 20

It really depends what you're doing with the data.

If you're going through entry-by-entry, you should be using readdir, if you actually need to have a list of the entries in memory, you should be using scandir.

There's no sense copying information into memory when you're going to be using it entry-by-entry anyway. Lazy evaluation is definitely the way to go in that case.

I would imagine that scandir is just a wrapper around the same thing that readdir is calling, and would therefore be slower.

Upvotes: 17

Related Questions