Reputation: 248
First please dont overlook because you might think it as common question, this is not. I know how to find out size of file and directory using file.length
and Apache FileUtils.sizeOfDirectory
.
My problem is, in my case files and directory size is too big (in hundreds of mb). When I try to find out size using above code (e.g. creating file object) then my program becomes so much resource hungry and slows down the performance.
Is there any way to know the size of file without creating object?
I am using for files File file1 = new file(fileName); long size = file1.length();
and for directory, File dir1 = new file (dirPath); long size = fileUtils.sizeOfDirectiry(dir1);
I have one parameter which enables size computing. If parameter is false then it goes smoothly. If false then program lags or hangs.. I am calculating size of 4 directory and 2 database files.
Upvotes: 3
Views: 7478
Reputation: 248
Answering my own question..
This is not the best solution but works in my case..
I have created a batch script to get the size of the directory and then read it in java program. It gives me less execution time when number of files in directory are more then 1L (That is always in my case).. sizeOfDirectory takes around 30255 ms and with batch script i get 1700 ms.. For less number of files batch script is costly.
Upvotes: 1
Reputation: 2656
We had a similar performance problem with File.listFiles() on directories with large number of files.
Our setup was one folder with 10 subfolders each with 10,000 files. The folder was on a network share and not on the machine running the test.
We were using a FileFilter to only accept files with known extensions or a directory so we could recourse down the directories.
Profiling revealed that about 70% of the time was spent calling File.isDirectory (which I assume Apache is calling). There were two calls to isDirectory for each file (one in the filter and one in the file processing stage).
File.isDirectory was slow cause it had to hit the network share for each file.
Reversing the order of the check in the filter to check for valid name before valid directory saved a lot of time, but we still needed to call isDirectory for the recursive lookup.
My solution was to implement a version of listFiles in native code, that would return a data structure that contained all the metadata about a file instead of just the filename like File does.
This got rid of the performance problem but added a maintenance problem of having to native code maintained by Java developers (lucking we only supported one OS).
Upvotes: 2
Reputation: 533510
A File is just a wrapper for the file path. It doesn't matter how big the file is only its file name.
When you want to get the size of all the files in a directory, the OS needs to read the directory and then lookup each file to get its size. Each access takes about 10 ms (because that's a typical seek time for a hard drive) So if you have 100,000 file it will take you about 17 minutes to get all their sizes.
The only way to speed this up is to get a faster drive. e.g. Solid State Drives have an average seek time of 0.1 ms but it would still take 10 second or more to get the size of 100K files.
BTW: The size of each file doesn't matter because it doesn't actually read the file. Only the file entry which has it s size.
EDIT: For example, if I try to get the sizes of a large directory. It is slow at first but much faster once the data is cached.
$ time du -s /usr
2911000 /usr
real 0m33.532s
user 0m0.880s
sys 0m5.190s
$ time du -s /usr
2911000 /usr
real 0m1.181s
user 0m0.300s
sys 0m0.840s
$ find /usr | wc -l
259934
The reason the look up is so fast the fist time is that the files were all installed at once and most of the information is available continuously on disk. Once the information is in memory, it takes next to no time to read the file information.
Timing FileUtils.sizeOfDirectory("/usr") take under 8.7 seconds. This is relatively slow compared with the time it takes du, but it is processing around 30K files per second.
An alterative might be to run Runtime.exec("du -s "+directory);
however, this will only make a few seconds difference at most. Most of the time is likely to be spent waiting for the disk if its not in cache.
Upvotes: 3
Reputation: 9652
I'll add to what Peter Lawrey answered and add that when a directory has a lot of files inside it (directly, not in sub dirs) - the time it takes for file.listFiles()
it extremely slow (I don't have exact numbers, I know it from experience). The amount of files has to be large, several thousands if I remember correctly - if this is your case, what fileUtils
will do is actually try to load all of their names at once into memory - which can be consuming.
If that is your situation - I would suggest restructuring the directory to have some sort of hierarchy that will ensure a small number of files in each sub-directory.
Upvotes: 0
Reputation: 346300
File objects are very lightweight. Either there is something wrong with your code, or the problem is not with the file objects but with the HD access necessary for getting the file size. If you do that for a large number of files (say, tens of thousands), then the harddisk will do a lot of seeks, which is pretty much the slowest operation possible on a modern PC (by several orders of magnitude).
Upvotes: 4
Reputation: 2742
I think that you need to read the Meta-Data of a file. Read this tutorial for more information. This might be the solution you are looking for: http://download.oracle.com/javase/tutorial/essential/io/fileAttr.html
Upvotes: 1